The Mathematics of Wordle Codebreaking
When someone shares a Wordle result, they share a grid of colored squares. No letters. No guesses. No answers. Just colors.
The claim is this: the colors alone are often sufficient to determine the answer.
This is not obvious. It feels like being handed a shadow and asked to identify the object. But the shadow is sharper than it appears, and there are fewer objects in the room than you think.
A Wordle pattern is a sequence of five symbols, each one of three values: green, yellow, or grey. There are 35 = 243 possible patterns. We can represent them compactly: 2 for green, 1 for yellow, 0 for grey. So the pattern green-green-yellow-grey-grey becomes 22100.
Every pair of five-letter words—a guess and a solution—produces exactly one pattern. If you guess DOLLS and the answer is HOTEL, the pattern is determined by the rules of the game: which letters are correct and in place, which are correct but misplaced, which are absent.
This means we can think of the pattern as a function:
pattern(guess, solution) → one of 243 values
This function is deterministic. Given a guess and a solution, the pattern is fixed. What makes it useful for codebreaking is that we can run it backwards.
Suppose your father shares a Wordle result. His first guess produced the pattern 22010. You don’t know what he guessed. But you know this:
For some word G: pattern(G, answer) = 22010
The answer must be a word for which there exists a valid guess producing that exact pattern. Many words fail this test. If the answer were SHAME, is there any five-letter word whose first two letters match S and H, whose third letter is in SHAME but not in position 3, whose fourth letter matches position 4, and whose fifth letter is not in SHAME? For most candidate answers, no such guess exists. Those candidates are eliminated.
This is the fundamental operation: for each candidate solution, check whether any valid guess word produces the observed pattern with it. If no guess works, that solution is impossible.
Wordle uses approximately 2,300 valid solution words and accepts approximately 6,500 valid guesses (including rare words that can be guessed but are never the answer). This gives us:
Carol-B precomputes this entire table. It is 1.5 GB as a CSV. This is the ammunition depot.
One pattern from one person’s guess typically eliminates 70–90% of candidate solutions. But the real power comes from sequences.
When someone solves Wordle in, say, four guesses, they produce four patterns in sequence. Each pattern was produced by a specific guess that was consistent with the information the player had at that point. If the player uses hard mode (most do), each guess must be a word that could still be the answer given everything they’ve learned.
This means the guess path itself is constrained. The second guess must be consistent with the first pattern. The third must be consistent with the first two. And so on. Carol-B models this by performing sequential inner joins:
With one family member’s four-row grid, you might be down to 20 solutions. With two family members, often 3–5. With three, often one.
There is a deeper structure here, and it connects to Claude Shannon’s theory of information.
Each Wordle pattern conveys information by partitioning the solution space. A pattern that leaves 100 possible solutions out of an original 2,300 has conveyed log2(2300/100) ≈ 4.5 bits of information. The maximum possible information from one guess is log2(243) ≈ 7.9 bits (if every pattern were equally likely).
The optimal first guess is the word that maximizes the expected information: the one whose patterns partition the solution space into the most groups of the most equal size. This is the same principle as binary search, except instead of splitting a list in half (base 2), you are splitting it into up to 243 groups (base 35). It is binary search generalized to a higher base.
This is why the NYT Wordle Bot recommends words like CRANE or SLATE as openers—they produce the most even distribution of patterns across the solution space, maximizing expected information gain.
Carol-B includes a guessEvaluator that computes this: for each possible guess, it pivots the pattern table to count how many solutions fall into each of the 243 pattern buckets. The guess that produces the smallest maximum bucket is the one most likely to narrow things down in the worst case. This is the minimax strategy, and it is provably optimal for a single step.
On April 24, 2024, Sade shared her Wordle result. Five guesses, five rows of colored squares. The patterns, encoded:
| Guess # | Pattern |
|---|---|
| 1 | 00000 — all grey |
| 2 | 00000 — all grey |
| 3 | 00000 — all grey |
| 4 | 00000 — all grey |
| 5 | 22222 — all green |
Four rows of all grey. Then a perfect solve on guess 5. This means Sade’s first four guesses contained zero correct letters. None of the 20 letters she tried in her first four guesses appear in the answer. Since there are only 26 letters, the answer can only use the 6 remaining letters. There are very few five-letter words composed entirely from a set of 6 letters.
Carol-B finds them instantly.
The notebook itself—the Jupyter notebook named Carol-B, written in April and May of 2024—is preserved in its original form. It is not elegant code. It is the work of someone who was building a weapon, not a library. But it works, and the mathematics are sound.