Setup / Chapter 01 of 02 / ≈ 5 minutes

How LLMs Work.

Training pipeline, tokens, and next-token prediction — what's actually under the hood, and why prompting matters.

You'll learn

· The three training stages
· Why LLMs predict tokens, not words
· Why fluent ≠ true

§ 1

The training pipeline.

Three stages, in order. Each one teaches the model something the previous one couldn't.

Stage one

Pre-training

The model reads trillions of words from books, websites, and code. It learns next-word patterns by predicting blanks, billions of times over.

Output: raw linguistic capability — fluent text, latent knowledge — but no instruction-following, and no taste.

months · thousands of GPUs

Stage two

Supervised fine-tuning

Humans write thousands of ideal answers to example questions — the kind of careful, structured replies you'd want from a thoughtful colleague.

The model learns to imitate that style: to follow instructions, to format, to refuse, to explain its reasoning.

SFT · curated demonstrations

Stage three

RLHF

Humans rank multiple model answers side by side. The model learns what humans prefer — tone, helpfulness, safety, whether an answer feels right.

This is also where most of the model's caution lives — and where its sycophancy creeps in.

reinforcement learning · preferences

§ 2

Tokens, not words.

LLMs don't read whole words. They read tokens — three- to four-character chunks. "hospital" might be one token; "hospitalization" is usually three. Common medical abbreviations sometimes get split in awkward places, which is why models occasionally mangle them.

Try it — type a clinical sentence and watch how the model would chop it up.

Input

Token count

tokens

chars 0

words 0

Tokenization (heuristic)

A real tokenizer (BPE) is more sophisticated, but the lesson is the same: the model never sees your sentence as letters or words. It sees a sequence of integer IDs.

§ 3

Next-token prediction.

Given everything so far, the model assigns a probability to every possible next token, then samples one. That's it. There is no plan, no outline, no lookup — just a probability distribution, conditioned on the text in front of it.

Prompt "The patient was admitted to the

hospital

85.0%

ICU

10.0%

floor

3.0%

unit

2.0%

The model picks one of these stochastically — with some randomness governed by a parameter called temperature. Then it conditions on that choice, predicts the next next-token, samples again, and so on. One word at a time. That's how an entire H&P, an entire essay, an entire wrong answer gets generated.

A note on temperature

At T = 0 the model always picks the top token — deterministic, dull, sometimes brittle. At higher temperatures it explores — more creative, more wrong.

LLMs don't look up answers. They generate the most plausible continuation of your text. Plausible is not the same as true.

The whole reason this tutorial exists

Home

Up next

Chapter 02 — Prompting

Next: Prompting