How It Works
Wobble is powered by a reinforcement learning agent trained to solve Wordle-like puzzles using a simplified Q-learning algorithm.
The Learning Process
The agent plays Wordle-style games repeatedly, learning from trial and error. Each game is an episode, and after thousands of episodes the model improves its ability to pick strong guesses.
- State: After each guess, the environment provides feedback about how many letters are correct and in the right position (greens), and how many letters are correct but misplaced (yellows).
- Actions: The agent chooses a word to guess. Over time it learns which guesses lead to better outcomes.
- Reward: The agent receives positive points for getting
greens and yellows, a large bonus for solving the word, and a penalty
if it fails within 6 tries.
if (guessed): reward += 100 - 15 * log2(7 - turn)
else: reward -= 1000
Learning is done using the Q-learning update rule, which adjusts the value of each state-action pair:
Q[s][a] ← Q[s][a] + α * (reward + γ * max(Q[s’]) -
Q[s][a])
- α is the learning rate (how much new info overrides old).
- γ is the discount factor (importance of future rewards).
- max(Q[s’]) is the best future value from the next state.
Training
The agent is trained by playing against a large set of possible words. Over time, it learns which strategies increase the chance of solving the puzzle within the allowed attempts. The results of training are stored in a Q-table, which the bot then uses to make informed decisions during play.
Outcome
After training, the agent is able to approach Wordle systematically — starting with informative guesses, narrowing down possibilities, and converging on the correct word more efficiently than random play.
Note:
- No official Wordle™ code, data, or other resources are used.
- All training is done locally with custom word lists and environments.
Wordle is a trademark of The New York Times Company. This project is not affiliated with or endorsed by The New York Times Company.