ATTENTION · TRANSFORMER SIMULATION

▸ Cutting the sentence into tokens & loading the vocabulary…
▸ Looking up an embedding for each token
▸ Producing Query · Key · Value for self-attention
▸ Computing softmax(Q·Kᵀ/√d) weights · multiple heads
▸ Calibrating temperature & the next-token sampler…
▸ Ready — Online. ✅
0%
⌂ Mind & Machine

Simulation room Attention & Transformer

Attention & LLM
Online
token · softmax · word prediction
Model state
💬 Attention & word prediction
Tokens
Query token
Next guess
Top probability
Temperature T
Attention heads
Notes
An LLM like ChatGPT reads a sentence as tokens, each token ‘attends’ to others to understand context, then predicts the next token by probability. This is a simplified intuition of attention — not the full architecture.
Pick a ‘Scenario’ to change the view (attention links · matrix · multi-head · text generation · temperature · long-range context) · drag Temperature to see the distribution sharpen/flatten · click a concept for details
Your browser has canvas disabled.
Top-token probability & temperature over time top probabilitytemperature T