How it's built
The provider abstraction, the context reduction, the agent loop, and the cost guards that make each sweep cheap and traceable. Enough to judge the engineering, not a marketing tour.
Architecture
- swap pointLink flowFixtures today, live Flexpa Link when keys land
- Provider interfaceOne contract, either source behind it
- FlattenerFHIR bundle to compact TSV rows
- AgentLLMBackend, CLI local or BYOK deployed
- Eval harnessGraded against authored ground truth
- Replay layerCaptured runs, re-served with no live calls
Flattening the context
The reduction is ViewDefinition-inspired, credit to Larry Ditton's Flexpa post on SQL on FHIR for LLM context reduction. Rather than run a full ViewDefinition runtime, Clawback flattens each EOB into one tabular row and serializes it as compact TSV for the model.
On the demo sweep that cut the context from 24,610 raw FHIR tokens to 1,936 flattened tokens, a 92.1% estimated reduction, chars-per-token estimator on both sides of the division. Larry's post reports a similar figure on its own corpus, but the two are not directly comparable. That measurement ran over 14 real EOBs and this one runs over 48 synthetic single-line claims, so treat the closeness as directional, not a benchmark. The token meter shows the paired counts and the dollar cost, and the evals show the scores hold on the smaller context.
Build versus buy
Four decisions every LLM product has to make, and what this demo chose for each.
| Concern | What this demo chose |
|---|---|
| Retrieval | Flatten the bundle to TSV rows. The corpus is small and bounded, so no vector store earns its keep. |
| Guardrails | Forced JSON schemas via zod, spend caps, rate limits, and provenance required on every finding. |
| Evals | A committed harness with an LLM judge under a separate strict prompt. Scores and failures are published. |
| Versioning | Every run persisted, replay artifacts and eval results committed by git SHA. |
Who pays, and what degrades
The plausible buyer is more likely an employer, a TPA, or a benefits platform than a consumer paying a subscription. A consumer version most likely works as free monitoring with a success fee on dollars actually recovered, so the member never pays to be told nothing is wrong.
The patient-access economics are real. Identity proofing to IAL2 adds a per-member cost and real funnel friction, and Flexpa's published tiers start at a $20K annual platform license. Those two facts mean the unit economics only close with either scale or a B2B2C channel that spreads the license and the proofing cost across many members.
The findings also degrade on real payer data, not just on this clean synthetic set. The copay-tier and deductible-accumulator findings need plan benefit design data that patient-access Coverage resources frequently do not carry, and the denial findings need CARC codes that payers populate inconsistently. The adjudication-quality script in docs/adjudication-quality.md exists to measure exactly this per payer. The live section stays pending until API keys land.
Stack
| Next.js 15 | App Router, server components, one deployable |
| TypeScript | Strict types across agent, flattener, and UI |
| claude-sonnet-5 | Subject and judge, reached through an LLMBackend abstraction |
| zod | Schema validation on every model response |
| vitest | 180 passing tests across the pure logic |
Built by
Life-OS is the agent system I run my own work through, a set of Claude Code agents that handle my pipeline, my research, and my writing every day. Before that I helped build and sell two healthcare companies, most of that work on the reimbursement side, so claims data and denial logic are familiar ground.
I applied to Flexpa on June 10 and built this demo on their rails. The demo is the pitch. Everything here runs on synthetic data, the evals are published, and the token meter shows the real cost.
More at chriscardinal.me.
Clawbacks have always run one direction. This one runs yours.