grounded-llm-triage-layer - AIXplore

# Build an LLM Triage Layer That Can't Freelance > [!tip] TLDR > **The why.** I built [ModelMap](https://modelmap.vercel.app), a triage tool for genomic AI models, and hit the question every LLM-assisted tool runs into: where do you let the model decide, and where do you stop it? Genomic model selection has a high cost of being confidently wrong (someone reaches for a DNA language model when their task needs a population statistical-genetics method), so "the model probably knows" was never going to be the answer. > > **The shape.** A TypeScript engine and a Python reference both run the same deterministic triage rules. The LLM does exactly two jobs: parse free-text intent into a controlled vocabulary, and explain a verdict the rules already produced. A versioned fact pack (`atlas.json`) is the only thing it reads. The hosted app uses bring-your-own-key OpenRouter over PKCE, so the user's key never touches the server, and the deterministic path needs no key at all. > > **The hard part.** Not the rules. The discipline of enforcing the trust model in *code* at ingestion rather than asking for it in a prompt. When the model drafts a new entry from fetched sources, ingestion force-sets every judgment field to low confidence, rebuilds the source list from URLs actually fetched, and discards anything the model invented. The model extracts; the code distrusts. Getting that boundary clean was most of the work. > > **Reproduction prompt for Claude Code:** > > > Build a triage tool for a domain where picking the wrong class of tool is a costly category error. Define a controlled vocabulary of classes in a YAML file (the Rosetta Stone), and intent mappings (I-Have / I-Want to class) in a second YAML. Write a deterministic rule engine that takes parsed intent and returns a class assignment plus a wrong-tool verdict, and port the identical logic to both your server language and the client so the wizard runs with no API call. Wire an LLM into exactly two seams: parse free-text into the controlled vocabulary, and explain a verdict in prose. Ground every LLM call in a versioned fact pack built from your source data. For any ingestion that drafts new entries, enforce the trust model in code: force every judgment field to low confidence on draft, rebuild the source list from the URLs you actually fetched with today's access date, and drop any source the model invented. Use bring-your-own-key OAuth (OpenRouter PKCE) so the hosted app never holds a shared key. Most LLM tools fail in the same place. The model is fluent, the demo is clean, and then it confidently answers a question it had no business answering. The fix that gets reached for is a better prompt. The fix that holds is an architecture where the model never had the authority to be wrong in the first place. I want to show that architecture concretely, on a tool where being confidently wrong has a real cost. ## The problem, in one paragraph "Genomic AI model" is a phrase that hides at least nine distinct method classes, and they answer different questions. A DNA language model reads raw sequence and emits embeddings. A statistical-genetics engine like BOLT-LMM tests genotype-phenotype associations across a cohort and never reads sequence grammar at all. These are not competing tools on one leaderboard; they are different computational objects. The common failure is a **category error**: a researcher with a population-association question reaches for the model that was in the news, gets a fluent-looking answer, and ships a result built on the wrong primitive. Leaderboards structurally can't catch this, because a leaderboard ranks tools *within* a class. The fix is to classify the computational object before you rank it. (The reader-facing version of that argument lives on Run Data Run: [The Wrong Tool Problem in Genomic AI](https://rundatarun.io/p/the-wrong-tool-problem-in-genomic). Here I'm building the thing.) > **The temptation is to let the model decide. The discipline is to let it translate and explain while code decides and distrusts.** ## Hybrid, not LLM-all-the-way The first decision is the one that determines everything downstream: which component owns each consequential call. In ModelMap, a deterministic rule engine owns every class assignment and every wrong-tool verdict. It reads parsed intent and the class ontology and returns a verdict, with no model in the loop. That engine is a TypeScript port (`engine.ts`) that runs client-side in the wizard, and a Python reference (`triage.py`) that runs the same logic server-side and in tests. Same rules, two runtimes, fully auditable. An expert can read why DNABERT-2 got flagged for a GWAS task without trusting a black box. The LLM does exactly two jobs, and neither is a decision: - **Parse free-text intent** into a controlled vocabulary. "I have whole-genome sequence and I want to find variants associated with disease across my cohort" becomes a structured I-Have / I-Want pair drawn from a fixed enum. - **Explain the verdict** in prose, after the rules have produced it. ![[grounded-llm-triage-layer-two-lane.png]] The controlled vocabulary is the spine. `classes.yaml` holds nine method classes (the Rosetta Stone), each with a plain-language meaning, what it is and is not for, and a `what_it_actually_does` line: ```yaml statistical_genetics_engine: label: Statistical genetics engine plain: > Tests genotype/phenotype associations across populations using genotype matrices, phenotypes, and covariates. best_for: [GWAS, heritability, cohort genotype/phenotype association] not_for: [reading single-sequence grammar, promoter/enhancer activity from one sequence] what_it_actually_does: > operates on cohorts of genotypes and phenotypes to produce p-values and effect sizes; it never reads raw sequence grammar. ``` `use_cases.yaml` maps I-Have / I-Want pairs onto those classes, and that mapping is what drives triage. The LLM picks values *from* this vocabulary. It never coins a new class, never assigns a license, never decides a verdict. It is a translator at the boundary, and the rules do the work inside. ## The trust model, enforced in code This is where most "grounded" tools quietly cheat. They put the trust rules in the prompt ("only use confirmed facts, never guess a license") and call it grounded. A prompt is a request, not a guarantee. Under distribution shift or a clever input, the model will guess anyway, and the only thing standing between that guess and your data is more prose. ModelMap enforces the trust model in code, at the one place new claims enter the system: ingestion. The atlas distinguishes **facts** from **judgment fields**. Facts (a paper's title, a repo URL) auto-publish. Judgment fields (method class, license, commercial and clinical use status) each carry a `field_confidence`. Anything at `field_confidence: low` renders in the UI as **unreviewed** and is never presented as settled. A field flips to high only when a human checks a primary source in that session. When the ingestion loop drafts a new card, the model *extracts* facts from fetched source text. Then the code distrusts the result: ```python # judgment fields. never auto-promoted; a human verifies against a primary source JUDGMENT_FIELDS = ["model_class", "license", "commercial_use_status", "clinical_use_status"] # sources: rebuilt from what we actually fetched, never from the model card["sources"] = [ {"url": u, "source_type": _source_type(u), "date_accessed": TODAY} for u in src_urls ] # trust model, enforced in code card["review_status"] = "draft" fc = card.get("field_confidence") or {} for f in JUDGMENT_FIELDS: fc[f] = "low" card["field_confidence"] = fc ``` Three moves, all mechanical, none of them asked-for in a prompt. `review_status` is force-set to `draft`. Every judgment field is force-set to `low`. The source list is **rebuilt from the URLs the fetcher actually retrieved**, stamped with today's `date_accessed`, so any source the model hallucinated is discarded by construction rather than caught by review. > [!info] The model extracts, the code distrusts > Keep the split clean and the rest follows. The LLM is good at pulling structured facts out of messy source text. It is not trustworthy as the authority on whether a fact is settled. So extraction is the model's job and trust assignment is the code's job, and the two never blur. The proof it works is unglamorous, which is the right kind of proof. When the loop drafted a card for GPN-MSA, it correctly refused to guess the license and clinical status (both landed `low` and rendered unreviewed), and it honestly reported when one source hit a bot wall and another returned a 404. No confident fabrication, because the path that would have produced one doesn't exist. ## Grounding discipline Grounding only means something if it's the same grounding everywhere. The triage explain layer and the ingestion draft call both read the same versioned fact pack (`exports/atlas.json`) and reuse the same controlled vocabulary. The ingestion draft even calls the same internal `_chat` path the triage explainer uses, so there's one grounded code path, not two that drift. Search and fetch route through a single gateway rather than ad-hoc calls scattered through the code. That gives you one place to log every retrieval, one place to enforce timeouts and fallbacks, and one place where the URLs that become `sources` are known to be real (the fetcher returns them; the model doesn't get to name them). Centralizing retrieval is what makes "rebuild sources from what we fetched" a one-liner instead of a reconciliation problem. Versioning closes the loop. `atlas.json` is a build artifact regenerated from the YAML source of truth, so the fact pack the model reads is the same one shipped to the UI, and a data change is a diff you can review. ## Bring-your-own-key, so there's nothing to abuse The hosted app needs the LLM only for the two boundary jobs. So I made the LLM optional and zero-trust. The "Ask in plain English" mode uses **bring-your-own-key OpenRouter over PKCE OAuth, fully client-side**. The user connects their own OpenRouter account, picks their own model, and the key lives only in their browser. ModelMap never sees it. There is no shared key, which means no rate-limit to police, no spend to cap, no abuse surface to defend. The OAuth callback uses `window.location.origin`, so it's host-agnostic with no hardcoded domain. And the deterministic wizard needs **no key at all**. "Key out my task" walks I-Have to I-Want to recommendation entirely in the client engine. The LLM is sugar on top of a tool that stands alone. > [!tip] The architecture lesson > If the deterministic core is real, the LLM is a convenience layer you can make zero-trust. You only get to do that if the core actually works without the model. The discipline of building the rules first is what earns you the option of treating the LLM as optional. That's the dependency you want to invert. Most LLM tools make the model the thing they can't run without, then spend their security budget defending a shared key. Make the rules the thing you can't run without, and the model becomes a feature you can hand the user the keys to. ## The reusable pattern None of this is specific to genomics. The pattern is: 1. **Controlled vocabulary.** A fixed enum of the consequential categories your tool reasons about, with plain-language meaning attached. The LLM picks from it; it never extends it. 2. **Deterministic rules for every consequential decision.** Auditable, ported to wherever they need to run, no model in the decision path. 3. **An LLM confined to translation and explanation.** Free-text in to controlled-vocab out, and verdict to prose. Two seams, both at the boundary. 4. **Trust enforced in code at ingestion.** Force judgment fields to low confidence on draft, rebuild provenance from what was actually retrieved, discard invented sources. Not a prompt instruction, a code path. 5. **Grounding in a versioned fact pack.** One pack, read identically by every LLM call, regenerated from a source of truth you can diff. Run those five and you get an LLM-assisted tool an expert can audit, one that fails loud (a low-confidence field renders as unreviewed) instead of confidently wrong. The genomics specifics are interchangeable. Swap in legal document classification, infrastructure tooling, financial instrument routing, anything where the wrong class of answer is a category error rather than a near miss. ## The judgment for builders The pull is always toward letting the model decide, because the model is fluent and deciding looks like the hard part it should handle. It isn't. Deciding is the part you want auditable, deterministic, and yours. The model is brilliant at the seams: turning a researcher's messy sentence into a clean structured query, and turning a terse verdict into an explanation a human reads. Give it those, take back the rest, and enforce the boundary in code where it can't be argued away by the next clever input. ModelMap is [source-available](https://github.com/BioInfo/ModelMap) and research-use only. The deterministic core is the product; the LLM is the convenience layer you can afford to make zero-trust.