Outcome Tracking - fifteen-percent-decision

The notebook has twelve entries. Three weeks since the Williams decision, and it lives in his jacket pocket during interviews and on the corner of his desk between them — a habit that formed so gradually he can't point to the day it became necessary.

The entries aren't what he files in the department system, which is specific, formatted, and legally indexed. These are personal. He's developed a format: date in the left margin, name in caps beside it, then two values on the same line — the AI's confidence percentage and his own, which he assigns after the fact, reconstructing his read of each case the way you reconstruct a dream before it fades. Below that, a gap number: the arithmetic difference between the system's certainty and his. Below that, a column he added in week two. Outcome, with a date when he knows one and a U when he doesn't.

Most of the outcomes are U. The ones that aren't show the system holding. Parolees checking in on schedule, employment confirming, no violations flagged except Heller's curfew miss in week four, which resolved as a double shift that ran long and ended with a written warning. He checks the department database on Thursday evenings when the week's updates populate. Most Thursdays he finds nothing new to report. He writes C in the outcome column and the date beside it and closes the notebook. He doesn't know what he's looking for. He knows he's looking.

Janet Kim's desk is close enough that she can hear him thinking, which she told him exactly once in six years and never said again. Her Post-it system covers the upper frame of her monitor and the wall of her cubicle in color-coded flags — a filing method he's observed without asking about, because the colors clearly mean something and asking would invite explanation. She brings in baked goods on Mondays and is the reason the office has a consistent opinion about banana bread.

On a Thursday in mid-March, she stands to refill her water bottle and looks at him a second too long before she walks to the break room. When she comes back, she stops at the cubicle wall.

"What are you looking for?" she says. Not what are you working on. The other version.

He's in the middle of an entry — one of the week's cases, not a disagreement, just a routine approval he's adding to the log because he's started tracking compliant cases for comparison — and he finishes the line before he answers. "Patterns."

"What kind?"

He caps the pen. "I don't know yet."

She nods. This is her response to answers she finds insufficient but has decided not to contest. She peels a Post-it from the pad she carries and sticks it to the edge of his monitor — blank, just the yellow square — and walks back to her desk. He doesn't ask what it means. The question stays with him for the rest of the shift. She asked the right version of it — not what are you doing but what are you looking for, which implies she's noticed the gap between the work and the purpose. He has caseloads, compliance reports, a supervision calendar with fifty-three names on it. None of that explains the notebook. The notebook is something else, and Janet can read something else in his posture from across a cubicle wall. He peels the Post-it from his monitor at shift change and puts it in his jacket pocket with the notebook.

The apartment is on the second floor of a building on Michigan Avenue that was renovated six years ago and still carries that renovation smell when the heat cycles on — something chemical underneath the wood. He keeps the kitchen table positioned so he can work with both the notebook and the laptop open, the overhead light on because the lamp in the living room isn't bright enough for this kind of reading. Michigan Avenue runs below the window, wet from this afternoon's sleet, and the sound of it comes up in irregular intervals: a bus, a horn, the grind of something heavy going by in the left lane.

He has access to the department's outcome tracking system through his work credentials — check-in records, employment verification, violation reports, new criminal charges, updated weekly. This is standard access for a senior parole officer with a full caseload, and accessing records for cases he reviewed before handoff is within its scope, technically. He's checked it four Thursdays now.

He builds the comparison from the seven entries where he would have denied and COMPASS-NG approved. Valdez: 52 days out, employed, compliant. Davis: compliant, one note about a flagged address that he has a follow-up scheduled on. Heller: the curfew miss, resolved. The others are clean, all of them, check marks down the column. Six of seven proceeding in good standing. One technical violation, resolved without escalation.

His own historical numbers run around 78% compliance in the first sixty days for cases he approved. The AI's current group is tracking at 86% for the same window — higher than his rate, higher than the district average, which is published quarterly and which he knows the way he knows the exchange rate on a currency he doesn't use. The number sits in his head without a frame to put it in. He writes in the margin beside the outcome column: AI holding. 6/7 compliant day 52. Then: Historical: ~78%. Then, after a pause: Gap: 8%. The 8% might be the better algorithm. It might be the particular population this algorithm approved, which is a different question — whether the system selects more carefully, or whether he's comparing his general approval pool to a specific subsample of AI overrides, which would make the comparison meaningless. He writes neither conclusion. He puts a line under the note and leaves it. The coffee he made at seven is cold, and he drinks it anyway.

The first entry is still underlined. Williams, D. Feb 14. AI: approve 73%. Me: deny. Gap: 18. Outcome: U. He's thought about filling in that outcome column. The database would show it — whether Williams checked in at the two-week mark, whether he's employed, whether anything has been flagged in thirteen days. Technically accessible, technically within his access. He tells himself thirteen days is too short for a meaningful data point. A parolee compliant at day thirteen is meeting the lowest possible threshold; it tells him nothing about the prediction either direction. He doesn't open the database for it. He leaves the U where it is.

Earlier in the day, at the end of shift, a new review came through the queue. The file was straightforward — two years out, employed, stable housing, no flags in the interim check. COMPASS-NG loaded at 81% confidence: APPROVED. The green light.

He agreed with it. Quickly. He noticed only after the fact that he'd confirmed the decision before he'd finished reviewing the file — his eyes still in the employment section when his hand was already moving to the confirmation button. The green loaded and his first response was concurrence. He caught this mid-motion, went back, finished the review properly, confirmed. Everything in the file supported the approval. His final assessment matched the system's.

But the sequence was different from what it used to be. A year ago he would have formed his own read first, and then seen the system's. Now the confidence score is on the screen before the file opens. 81%. 73%. 68%. He has been running his own read against those numbers for three weeks, and somewhere in that running he may have started to anchor to them — letting the machine's certainty set the range within which his own certainty falls, before he's made an assessment. His agreement rate is 84% for the period he's been tracking. His pre-tracking average, working from memory, was 71%. He writes both numbers down. Then he writes: Calibration? Or better AI cases? Then he draws a line between the two questions, because that's where they sit — not answered, not answerable with what he has, just positioned across from each other on the page. He doesn't know which way the causality runs. He's started to think this might be the question.

The format settled by week two: date in the left margin, name in caps, AI confidence and personal confidence on the same line — the latter in parentheses, to mark it as softer, which it is — gap below that, then outcome. Some entries carry a fourth element: a single-line note flagged with a small square bullet he draws by hand because the notebook is unlined and there's no formatting here but what he imposes.

Robinson, T. Feb 17. AI: approve 79% / ME: approve (81%). Gap: 2. Outcome (Mar 8): C. Davis, M. Feb 19. AI: approve 68% / ME: deny (50%). Gap: 18. Outcome (Mar 8): C + note. Williams, D. Feb 14. AI: approve 73% / ME: deny (55%). Gap: 18. Outcome: U.

Twelve entries in three weeks. Seven where his read and the system's diverged by more than ten points. Five of those seven have known outcomes, all compliant. Two still unknown — Williams and a case from last week. He fills in what he can. He leaves the blanks. The notebook is ninety-six pages. He is on page five. The outcome column is mostly empty, and filling it is going to take time he doesn't have a framework for. He doesn't know what a complete version of this project looks like. He knows that the U entries bother him more than the C entries satisfy him, which might be the most honest thing he's written down in weeks.

The next afternoon, Jerome Watts comes in for his scheduled check-in. Eight months into a two-year supervision period, maintenance worker with a property management company on Woodward, program meetings tracked, drug screens clean. He arrives at 2:04 for a 2:00 appointment, wearing a work shirt with a company logo and what looks like joint compound dried on the sleeve above the cuff. The interview runs twenty minutes. Employment confirmed, housing confirmed, conditions met. Watts mentions a possible supervisor position at the property company in a way that doesn't ask for approval but doesn't pretend the news is neutral either — he states it like a fact he's decided to include, and then he waits. Marcus says it sounds like good news. He means it.

Afterward he pulls the original COMPASS-NG assessment from eight months ago: 74% confidence, APPROVED. He'd agreed with that one. Everything today is consistent with the prediction — employment, compliance, no violations. The system read this file in August and it's still reading the same way in March. He opens the notebook, thinks about adding Watts to the tracking log. But Watts isn't a case where he and the system diverged. There's nothing to track against his own call, and he closes the notebook without writing anything.

He sits for a moment with the closed notebook in front of him, the office running its usual sounds around him — keyboards, the HVAC's low cycling, someone on a call in the interview room at the end of the hall. The cases where the system is right and he agrees don't trouble him. The cases where the system is wrong and he's right don't trouble him in the same way either. What he's finding, in three weeks of tracking, is that the cases where the system is right and he would have been wrong are the ones he keeps coming back to — and that being wrong doesn't feel like information he can act on. He was wrong. The machine was right. That's the whole entry.

He pockets the notebook. He opens the next case file. COMPASS-NG processes for four seconds — he's counted, out of habit now — and returns a green light.

He looks at the light. Then he reads the file.