Reinforcement learning: an agent learning by trial & error

REINFORCEMENT LEARNING SIMULATION

▸ Building the gridworld & placing goal / traps…

▸ Initializing the Q value table to 0

▸ Loading the ε-greedy policy (explore ↔ exploit)

▸ Setting the Bellman update: Q ← Q + α[r + γ·maxQ′ − Q]

▸ Seeding the deterministic RNG (mulberry32)…

▸ Ready — Online. ✅

Learning progress

🎮 Simple grid

Episode—

Episode reward—

Steps this episode—

Exploration ε—

Success rate—

Best value—

Notes

Reinforcement learning: an agent acts in an environment, gets a reward then adjusts to maximize cumulative reward. No one teaches the right move — it learns by trial & error over many episodes. This is how AI plays Go (AlphaGo) & Atari games.

—

Pick a "Scenario" to change the environment (traps · slippery · cliff…) · 🔀 new grid · ↺ relearn from scratch · click a structure/concept for details · bright cell = high value, arrow = best move.

Your browser has canvas disabled.

Reward & exploration ε per episode reward (smoothed)exploration ε