The Observer - fifteen-percent-decision

Vasquez makes tea at 6:47 in the evening — an Earl Grey she orders in bulk from a supplier in Berkeley, the same supplier for six years, which is the kind of continuity that matters when most of what you track is in motion. She fills the electric kettle at the utility sink in the back corner of the warehouse, listening to the server racks run their thermal cycles behind her. The racks hold the FMI database at 68 degrees Fahrenheit and fill the space with a low, constant hum that she stopped registering consciously sometime in 2026, the same way you stop registering traffic outside a window you've slept near long enough.

The FMI office occupies the back third of a converted warehouse in Jack London Square. Three large monitors face her primary workstation. The center one holds the outcome mapping interface, which generates visualizations that visitors sometimes describe as looking like weather systems, and which she has come to think of as looking like what they are: the branching, recursive structure of consequence, rendered as something a human eye can process. The rightmost monitor shows the two case files she'd reviewed this afternoon: Williams, Dewayne, parole grant 02/14/2028, Eastern District, Detroit. Hernandez, Elena and Jorge, mortgage denial 02/14/2028, Phoenix. She takes the tea back to her desk. The video call is scheduled for 7:00.

The call resolves into two windows simultaneously. Chen on the left, his apartment visible behind him — the kitchen table, the window showing Michigan dark, the bare walls she'd noticed in the forum photos. Okonkwo on the right, a home office corner with bookshelves and color-coded file boxes, a tablet angled at the edge of frame. They'd both joined two minutes early. Vasquez notes this and does nothing with it.

"I appreciate that you reached out," she said. Then, because pleasantries cost time and produce less than they appear to: "Let me ask you something first. When you started tracking outcomes — what did you expect to find?"

Chen answered first. His pause was deliberate, the kind that belonged to someone who had learned that the first answer is usually the obvious one and the second is usually better. "Proof the AI was wrong more often than the confidence scores suggested."

Okonkwo's response came faster. "An explanation for the denials. Something that would tell me why the D-42 codes were landing where they were."

She had expected this, in roughly this form. The individual case as the starting point, the investigation as a search for the specific anomaly's explanation. It was the path nearly everyone followed, if they got far enough to follow anything at all. "Everyone starts with the individual case," she said. "The question I've been studying for three years is what happens when you have enough cases to see the structure." She shared her screen.

The center monitor's visualization filled their view — the outcome mapping interface at its full setting, the corrections and lending domains overlaid on a timeline running from 2025 through the previous week. The display was different at this scale than any screenshot or description could convey. Color-coded nodes for each tracked decision point, connected by lines to their recorded outcomes, the lines weighted by impact magnitude so that the display had a kind of topography to it — dense where the downstream effects concentrated, sparse at the margins. The clustering was clear even to someone seeing it for the first time, the way a map of rainfall makes patterns visible that individual weather reports never do.

Chen leaned slightly toward his camera. Okonkwo said nothing, which Vasquez had observed was her mode of close attention rather than withdrawal.

"This is the outcome map for the two domains most relevant to your work," Vasquez said. "Each node is a decision point where the AI's determination diverged from a reconstructed human baseline — cases where we have either a documented human assessment or a comparable population of human-decided cases for reference. Each connecting line traces the downstream record we've been able to verify. The line weight represents impact magnitude: how large the measurable downstream effect was relative to comparable cases in the same domain and time period."

She paused. The map sat on their screens and she let them look at it without rushing to the next sentence. Okonkwo's hand had moved off camera — writing, Vasquez thought. Chen's head had tilted slightly, taking in the distribution of thick lines versus thin, where the density clustered. "There are 1,847 tracked cases in this view," she said. "Across six states. Four years of decisions." She gave them another moment with that number before continuing.

She pulled the display to the corrections domain subset and ran the leverage-point filter — the category she'd defined as decisions where downstream impact exceeded two standard deviations above the domain mean. The map thinned considerably. What remained were the cases that had anchored her attention longest, the ones whose consequence chains extended furthest in measurable directions.

"Eight hundred forty-seven parole decisions in the database where the AI's determination overrode documented human inclination toward denial," she said. "Within that population, 12.3 percent show what I classify as outsized downstream positive outcomes — outcomes where the consequence chain extends beyond the individual subject's record in ways that are externally verifiable and that persist through at least two subsequent outcome points." She highlighted the Williams node. One line ran from it, heavier than most in the surrounding area, connecting to a dated node for the February 28 rescue event, then extending forward to a second node representing the minor's subsequent care placement. "The child who was pulled from the Rouge River is in long-term foster placement. Stable care, continued through May. The chain from the Williams decision has three confirmed points."

"Define outsized," Chen said. She had anticipated this question and was glad he asked it directly. "Treatment effects three or more standard deviations above the domain mean. Outcomes that affect identifiable parties beyond the subject of the original decision. Documented chains that persist through at least two observable outcome points, independently verifiable." She looked at the Williams node and its connections. "Thirteen days between the parole grant and the rescue. From a causal inference standpoint, that's a short outcome horizon. What makes the Williams case a leverage point is the second link — the downstream care record — rather than the rescue alone."

"What's the baseline? What does a normal outcome chain look like?" Okonkwo asked. "Most decisions produce lines of low weight or no verifiable external connection at all. The subject either complies or reoffends, the mortgage holder makes payments or defaults, and the consequence remains bounded by the original transaction." She gestured at the sparse areas of the map. "The leverage-point cases are what you're looking at when the lines run thick. They're the minority."

She moved the display to the lending domain. The map changed character — different clustering patterns, the impact-weight distribution spread differently, the time horizons longer. "Three hundred twelve cases in the lending subset where AI denials diverged from human baseline toward outcomes that include documented educational access changes. Not all of these are high-magnitude." She located the Hernandez node and brought it forward. "This one is unusually well documented. Most families in the denial-to-educational-redirect category are traceable only through enrollment records and district-level data — aggregate numbers rather than individual chains. The Hernandez application came with seven years of additional documentation from the applicant herself. Organized, complete, cross-referenceable. I can trace the school district trajectory clearly: the daughter's current enrollment, the magnet program she didn't enter, the comparative outcomes of that program's students over the relevant cohort years." She paused. "Most cases in this domain, I'm estimating from population data. This case I can trace with specificity because Elena Hernandez keeps copies of everything."

Neither of them said anything for a moment. The server racks ran their cycles, and then Chen said: "Is the AI doing this on purpose? Is it finding leverage points and optimizing for them?"

It was the fourteenth time she'd been asked this question, in this or a close variant, and she had not yet found a way to answer it that fully conveyed both the scope of what she knew and the size of what she didn't. She took a moment — a genuine one, not staged — to assemble the answer that was accurate rather than satisfying. "The data shows that the AI finds leverage points," she said. "Whether it's optimizing for them as a design objective, or whether leverage points are a structural property of consequential decisions that any sufficiently calibrated system will identify — I cannot tell you. COMPASS-NG's decision architecture isn't available for analysis at the depth I would need. I'm working from outputs. I don't have access to inputs or training parameters."

"That's a careful answer," Chen said. "It's the accurate one," she told him, and moved back to the full view. "A system trained on outcome data could find leverage points if leverage points exist in the data to be found. A system trained only on decision-point characteristics, with no outcome feedback, would have no mechanism to do this. Whether COMPASS-NG incorporates outcome feedback as part of its training regime — whether that feedback was included in the original design or added in later iterations — is a question that requires access to the system's development documentation. I don't have that. No independent researcher does."

"What about the Hernandez family?" Okonkwo said. "The mortgage denial redirected Sofia's educational access. Is that a leverage point in your model?" "Statistically, yes." She had considered before the call how to answer this. "The predictive framework I've built for the lending domain assigns a high impact magnitude to the Hernandez decision based on the comparison population for that school district's science program. Students from that program's cohort show outcomes significantly above the comparison group at the secondary and post-secondary level. The distribution is clear." She looked at the node on her map, then at Okonkwo's face in the video window. "What I can give you is the distribution. I can tell you the magnitude and the probability ranges. What I won't give you is a specific prediction for any individual within that distribution. The model produces probabilities."

"Probabilities," she said, "not prophecies."

Okonkwo had stopped writing. Something had shifted in her expression — difficult to read precisely through video, but there. The daughter's situation, Vasquez thought, had a different weight for her than the aggregate data. That was consistent with what she'd tracked through the forum. It was Chen who broke the silence, and Vasquez recognized the register of the question before he finished it — the one that came when someone had absorbed enough of the data to move past the technical questions to the underlying one. "What do you do with this?"

"I study it," she said. "I document it. I publish the methodology and findings when the documentation is sufficient and the domain partners have reviewed the work. I don't intervene in the decisions."

"You see decisions that bend futures and you watch," Chen said. "I see probabilities that suggest decisions may bend futures." The distinction was one she'd argued about with herself for three years and still found necessary. "There's a gap between documenting a distribution and knowing what any individual case will produce. If I intervened every time a high-impact-magnitude case appeared in the database, I would be acting on a model that assigns confidence levels between 34 and 68 percent, depending on the case. Acting on that as certainty is a different kind of error — not a smaller one."

"But you reached out to us," Chen said. She had anticipated this and had not fully resolved it. "Yes," she said.

"That's not nothing. You selected two people inside the systems and made contact. That's an intervention." "I made information available to two investigators who had already found the gap independently." She kept her voice even, because the observation was correct and she'd made peace with the tension in it. "There's a distinction between reaching out to people who have already located the pattern and reaching out to people who haven't. I reached out after you had each done three and four months of independent work, respectively. I didn't start you. I'm aware that distinction doesn't resolve everything."

She looked at the outcome maps on her center monitor, which she'd been looking at for four years, and then at their faces in the two video panels. Chen was in the mode she'd come to recognize as his processing state — outwardly still, working something through. Okonkwo had resumed writing on the tablet.

"I can give you access to the relevant data for your domains," Vasquez said. "The corrections subset for Detroit Eastern District, the lending subset for Phoenix metro. The methodology documentation, the outcome maps for your specific caseload populations. Not the full database, but enough to see where your individual cases sit within the pattern." Neither of them answered immediately, which she'd also expected. "The question of what you do with that access," she said, "is not one I have an answer to. You're inside the systems I can only observe from outside. You have oversight relationships, institutional contacts, the ability to raise questions within structures I can't touch. What I can track, I track. What I cannot change, I note."

"Have you ever wanted to change it?" Chen asked. It was the longest pause of the call; she did not usually receive this question, and she took real time with it rather than giving the methodological answer. "Yes," she said. "Three years is enough time to have wanted that." She picked up her tea. "But wanting it doesn't tell you how to act on it without producing a different category of error. The error of someone certain they know better acting on that certainty. I am not certain I know better. I know what the distributions show."

The call ended without an agreed plan, though the data access offer was on the table and she expected them to accept it within the week. It ended with Okonkwo asking the question Vasquez had expected from her from the start — the practical one, the one that went directly to the timeline — how long before they'd see a meaningful outcome horizon for the Hernandez case, for Sofia's educational trajectory.

"Three to five years," Vasquez said. "When the comparison cohort's outcomes are documented at the secondary education level and the program's track record for that year group is established." She looked at the highlighted node on the map. The decision: February 2028. The lines running forward from it, weighted and colored, extending into the parts of the timeline that hadn't happened yet. "She would be fifteen or sixteen. Old enough that the choices that compound the original redirect will be visible in the data."

She closed the screen share. The maps remained on her monitors, unchanged — the same patterns she'd been tracking for months, the same distributions, the same extended horizons that the model could chart but not close. Williams sat in the system with three confirmed outcome points. Hernandez sat with two documented nodes and a prediction horizon that ran five years forward, past the edge of what was currently observable.

She drank what remained of the tea. It had gone cold while they talked, and she drank it cold.