Part VI · Information & Entropy — 12

Counting Information

We have spent eleven chapters moving information around. But how much is there? Before quantum, a beautifully simple idea answers it: information is surprise — and surprise, it turns out, is a number you can compute.

↩ before you start · keep these handy
·From Toolkit 0.3: a probability p is a slice of a pie between 0 and 1, and all slices sum to 1.
·log₂ reminder: log₂x asks “2 to what power gives x?” — e.g. log₂8 = 3 (since 2³ = 8), log₂½ = −1, log₂1 = 0.
·From Ch. 1: one bit is the answer to a single yes/no question.
🔑 symbol decoder · every new mark, in plain words
pᵢthe probability of outcome i — how likely that particular thing is. −log₂pthe surprise of an outcome, in bits — large for rare p, zero for p = 1. Hthe entropy — the average surprise, i.e. surprise weighted by how often each outcome happens. Σᵢ“add up over every outcome i” — the same sum sign from the probability pie. qshorthand for the leftover probability 1 − p — here, tails.
feel

Surprise is information

“The sun rose this morning” tells you almost nothing — you already knew it would. “It snowed in the desert” tells you a lot. The rule behind the feeling: a rare event carries more information than a common one. If something is certain, learning it gives you zero bits; the less likely it was, the more bits land when it happens. Information isn’t about the content of a message — it’s about how much it narrowed down what you didn’t know.

🎯 everyday picture

Play Twenty Questions. Each good yes/no question halves the remaining possibilities. To pin down one of 8 equally-likely things you need 3 questions (8 → 4 → 2 → 1), and log₂8 = 3. Entropy is exactly that: the average number of yes/no questions you’d need to nail down the answer. A fair coin needs 1 question; a coin you already know is two-headed needs 0.

recapInformation = surprise; the rarer the outcome, the more bits it delivers when it arrives.
play

Bend a coin, watch the bits

Slide the coin from fair toward loaded. A fair coin is the most uncertain — every flip is worth a full bit. Bend it, and each flip becomes more predictable, so it carries less information. At the extreme of a two-headed coin, a flip tells you nothing at all.

▸ entropy of a biased coinH(p) = −p·log₂p − q·log₂q
1 0 0 ½ 1 p (heads) bits H
heads{{ pPct }}
tails{{ qPct }}
bias
surprise of heads{{ surpH }}
surprise of tails{{ surpT }}
entropy H{{ entropy }} bits
{{ desc }}
recapEntropy peaks at 1 bit for a fair coin and falls to 0 as the coin becomes a sure thing.
math

Shannon’s formula, term by term

Give each outcome a surprise of log₂(1/p) = −log₂p bits — big when p is tiny, zero when p = 1. The entropy is just the surprise you expect on average — each outcome’s surprise, weighted by how often it shows up:

H = − Σi pi log₂ pi
fair coin:  H = −½log₂½ − ½log₂½ = ½ + ½ = 1 bit  (maximum)
certain outcome:  H = −1·log₂1 = 0 bits  (no surprise possible)

The unit is the bit: the answer to one ideal yes/no question. Entropy is, quite literally, the average number of yes/no questions you’d need to pin down the outcome — and the log base 2 is what makes that count come out in bits rather than nats or digits.

✎ worked example · a coin that lands heads 1 time in 4
1.p(heads) = 0.25, p(tails) = 0.75. Both slices add to 1. ✓
2.surprise of heads = −log₂0.25 = 2 bits (since 0.25 = ¼ = 2⁻²); surprise of tails = −log₂0.75 ≈ 0.42 bits.
3.weight each by how often it happens: H = 0.25×2 + 0.75×0.42.
4.H = 0.50 + 0.31 = 0.81 bits — less than a fair coin’s 1 bit, because this coin is more predictable. ✓
recapH = −Σ pᵢ log₂ pᵢ — average surprise, measured in yes/no questions.
⚠ common misconception

“A long, detailed message must carry more information.” Not necessarily. Entropy measures uncertainty removed, not length or importance. A thousand-page book that only ever says “yes” carries one bit; a single sharp, unexpected answer can carry many. Information lives in what you didn’t already know.

Hold onto this picture. In the next chapter we feed exactly this formula a quantum object — a density matrix’s eigenvalues — and out comes the von Neumann entropy, the quantum measure of how mixed a state really is.

✓ you can now
explain why information means surprise, and why a rarer outcome carries more bits
compute Shannon entropy H = −Σ pᵢ log₂ pᵢ for a coin and read it as yes/no questions
say why a long message can carry little information, and a short one a lot
← 11 Hardware next · 13 Quantum Entropy