Someone Else Is Watching - fifteen-percent-decision

The search starts where searches always start when you don't know what you're looking for — with terminology that's too broad and results that prove it. Marcus types "AI parole outcomes tracking" into the browser at 10:30 on a Wednesday night, the notebook open beside the laptop to the page where Dewayne Williams's name appears. The first page of results returns an advocacy organization site, a Brennan Center white paper dated 2024, and a Wired piece about COMPASS-NG's aggregate accuracy statistics he's already read twice.

He tries different framings. "Criminal justice AI decision accuracy, real-world outcomes." "Parole algorithm case-level review." "When algorithmic decisions produce unintended consequences — professional tracking." Each variation returns a different arrangement of the same material: academic literature that talks about populations and sample sizes, journalism that talks about scandal and systemic bias, policy work that frames the question as either algorithmic failure or algorithmic promise. None of it is what he's looking for, and he keeps circling because he doesn't yet know how to name what he wants.

What he wants is: other parole officers. Other decision-makers who sit in rooms, watch a green light appear, and then months later try to understand what followed. He wants to know if anyone else keeps a notebook. Not official documentation, not department files — a private record of cases where the system decided differently than the officer would have, and outcomes arrived that didn't match either prediction.

He tries "outcome tracker AI professional practitioner." He tries "AI approved I would have denied outcome tracking." At 11:22, he types almost without expectation: "AI decision outcomes practitioners forum." The fourth result loads differently than the others — no banner ads, no notification prompts, no social share icons. A plain white page with thread titles in blue, date stamps in gray, usernames in double brackets. The pinned post reads: This forum exists for professional practitioners who track outcomes following AI-assisted decisions. Share cases, identify patterns, compare across domains. No identifying information about clients or individuals. Pseudonymous participation only.

The forum runs on software that looks two generations old, and the effect is deliberate — this is not a place designed for casual scrolling. You read here because you chose to read here, because you searched for something specific and this is what the search returned. He reads for two hours.

The thread he opens first has forty-seven replies and was started eighteen months ago. The original post describes a single case: an AI-approved parole decision in what the poster calls a "southern district," against the officer's recommendation, followed ten weeks later by the parolee becoming a key witness in a homicide case. Testimony that produced a conviction. The poster's framing is careful: I'm not claiming the AI predicted this. I'm asking whether anyone else has observed decisions like this — where the outcome, in retrospect, seems disproportionate to any variable visible at the time of the decision. I have three cases now. I'm asking: is anyone counting?

Forty-seven replies over eighteen months. Some anecdotal, some skeptical, some from practitioners whose posts carry the specific vocabulary of people who actually do this work — industry shorthand, the compressed language of professional life, the kind of detail that doesn't fit a casual observer. A hiring manager in Cleveland who describes an employee whose AI-approved hire, against his recommendation to pass, ended three months later with that employee catching a refrigeration failure at 3 AM that preserved a pharmaceutical shipment the employer hadn't budgeted to lose. The hiring manager ends his post: I don't claim the system knew. I know what happened. A claims adjuster in Minneapolis who tracked a denied insurance claim that pushed a family into a state assistance program where the father received a diabetes screening he wouldn't otherwise have had. He was diagnosed. He is alive. She found out because she looked.

A poster whose handle is "underwrite_ghost" contributes a thread of their own: anomalous D-code mortgage denials clustered in specific zip codes, applicants who qualified by every standard metric, properties that checked out across documented risk categories. A different domain, a different system. The same structure: decisions that don't match the stated criteria, outcomes that arrive carrying weight that wasn't visible at the time.

The pattern these posts describe isn't universal — most entries on the forum are ordinary, the AI wrong in the expected direction or right in the unremarkable way. The cases that accumulate into something are a fraction of the total. But they're there, and the people describing them are not claiming the system is intelligent or guided or anything beyond what it is. They're doing it because they need to set it down somewhere. Because the alternative is keeping it in a notebook and not knowing if anyone else is doing the same thing, alone, in their own apartment, at 11 PM, watching a page they can't stop returning to.

Terrence Hall appears in Marcus's office doorway the next morning — not during the caseload check at 8 AM, not during the team briefing at nine, but just after Marcus comes back from the break room with his coffee and settles at his desk. Terrence stands in the doorway the way he always does for these conversations — leaning against the frame, coffee cup in one hand, already positioned to leave before the conversation officially begins.

"Running outcome queries," Terrence says. It isn't a question.

Marcus sets his cup on the desk. "Case review. Looking at Q4 and Q1 release outcomes against COMPASS predictions."

"That's not your task set."

The cubicle wall between Marcus's office and Janet's is thin enough that they can hear each other type. Marcus doesn't look toward it. "Being thorough," he says.

"That's a good instinct." Terrence shifts his weight, the gesture that precedes something he's already decided to say. "But thorough looks different depending on the function. Outcome analysis has a whole division — methodology, matched comparison populations, longitudinal tracking. We've got caseloads. Those aren't interchangeable." He pauses, letting the institutional logic settle. "You know the Henderson situation as well as I do."

He doesn't need to say more than that. The Henderson situation is what the field calls it — a parole officer who overrode an AI denial, released a parolee, and spent the following six months in a series of review meetings while the victim's family waited in adjacent hallways. Marcus knows it the way everyone in his field knows it: not as a data point about override rates but as the story they tell to explain why the framework is the framework.

"I'm not overriding anyone," Marcus says.

"I know you're not." Terrence finishes the sentence he didn't say out loud: because you're smart enough not to. He takes a half-step back toward the hallway, the exit already in motion. "Stay in your lane. The system does what it does. Outcomes belong to the people whose job it is to track outcomes. We implement. That's what they pay us for."

He's gone before Marcus responds, which is also part of how these conversations work — institutional advice delivered before it can become an argument. Marcus sits for a moment looking at the COMPASS-NG interface, the clean verdict field, the confidence percentage in the upper corner. The system does what it does. He pulls up his caseload and works through five routine check-ins.

That evening, at the kitchen table with the laptop open and the notebook beside it, Marcus creates an account on the forum. The username takes longer than expected — disconnected from anything identifying, but carrying enough professional signal that the forum's serious contributors recognize it as one of their own. The handles he's seen: "actuarial-az," "tier-two-returns," "underwrite_ghost." People named for what they do, not who they are. He types "casefiles_313." Area code and practice, nothing more. The post takes forty minutes to draft.

He keeps it clinical. Domain: parole. Context: AI approval rendered against his professional recommendation, the divergence rooted in a prison behavior record he'd weighted more heavily than the system appeared to. Outcome: approximately two weeks post-release, the parolee was involved in an incident with significant positive consequences for a third party. He doesn't name the river, doesn't specify the city, keeps the geography vague enough that the case couldn't be identified from the post alone. What he describes is the gap — the space between what was knowable at the time of the decision and what occurred afterward. The distance between the variables available to him and whatever the system had processed that he didn't.

My question is structural, not anecdotal, he writes at the end. Has anyone else documented cases where AI decisions produce downstream consequences that couldn't be predicted from the decision variables available at the time? I'm not asking whether the system is smarter than us. I'm asking whether there is a pattern to which decisions carry this kind of weight — and whether anyone is tracking it, or whether each of us is counting alone.

He reads it twice. He reads it a third time. He posts it. Two responses arrive within ninety minutes, both public. The first is from "tier-two-returns," who works parole in a different state and describes three cases matching Marcus's description — AI decisions that produced outcomes too specific, too well-positioned, for chance to account for. I stopped calling it coincidence after case three, the post reads. I don't have an explanation. I stopped trying to find one. The second post asks methodological questions: what's the base rate, has he controlled for the possibility that AI-approved cases just produce more notable outcomes because the approved population is larger, what's his denominator. Reasonable questions. The poster doesn't sound hostile. Marcus is composing a reply to the second post when the notification appears: a direct message, from a handle he's never seen on the forum, five sentences.

The thirteen-day interval between decision and outcome is consistent with what we've observed in other domains. You're not seeing coincidence. You're seeing structure. If you want to understand what you're looking at, keep tracking. We'll be in touch.

He reads it again, stopping at thirteen-day interval. His post had described the timeline as approximately two weeks post-release — he'd rounded, deliberately, to keep the case unidentifiable. He hadn't written thirteen days. He checks the post and reads every sentence: Approximately two weeks post-release. He opens the notebook to the page where Dewayne Williams's name appears, the entry from mid-March, written the evening after he drove to Brightmoor: Release February 15. Rescue February 28. Thirteen days. AI right. I was wrong. Child alive because of it. Why did it know?

Thirteen days is in his notebook. It is not in his post.

Someone in this forum has access to information Marcus didn't share. He doesn't know how. He doesn't know who "we" refers to, or how large that population is, or how long they've been watching the same case he only started watching three weeks ago. The message offers nothing further — no identification, no explanation of methodology, no invitation to reply. Just the interval, named with a precision he never provided, and the assertion that this isn't coincidence.

He looks at the laptop screen. He looks at the notebook. Outside the window, Michigan Avenue runs its late traffic — headlights moving through a Wednesday night, the city conducting its ordinary business without awareness that anything has changed. He picks up his pen.

Below the March entry, he writes: Someone else is watching.