Now that you've seen prompting matter, here's how to think about which model — and which harness — to reach for.
A model is one ingredient. The right model in the right harness, with the right prompt, is the actual unit of work.
Models trade off cost, latency, and depth of reasoning. Pick the lightest tier that still does the job — then move up only when the task needs it. The names below shift every six months; the shape of the spectrum doesn't.
Cost is per-token spend, relative within tier-of-task. Latency is rough wall-clock to a useful answer, not first token.
A weak model with a great prompt beats a strong model with a vague one. GPT-3.5 Turbo jumped from 5% to 32% just by switching to the expert prompt.
Workhorse is the daily driver. Claude Sonnet 4.6 moved 59% → 77% with the three-lens prompt — solid for routine audit work.
Frontier is qualitatively different. Claude Opus 4.5 hit 95% with the expert prompt — review-grade only because of the prompt.
A raw model only knows what it was trained on, only as of when training stopped, and only what fits in the prompt. Three patterns extend it. Each one shows up in the products you already use — knowing which is which tells you what the system can and can't do.
The model searches a knowledge base — UpToDate, hospital protocols, internal guidelines — before answering. The retrieved text is appended to the prompt so the model answers with current, institution-specific information instead of whatever it absorbed during training.
Let the model call functions during a conversation — search the web, read PDFs, query a database, run Python. The model decides when and how. This turns a static text generator into something that can act on live information instead of guessing.
An LLM that takes multi-step actions on your behalf. Reads files, searches the web, drafts documents, submits forms. You give it a goal; it composes its own sequence of tool calls to get there.
Start at the top, follow the branch that matches the actual task in front of you, and read the leaf as a recipe — model tier plus harness plus prompt style. The clinical-decision branch is highlighted because it's the only branch where a human-in-the-loop checkpoint is non-negotiable.
The model is the brain. The harness — RAG, tools, the interface you use — is the body. A great model in a bad harness underperforms a smaller model in a great one.