▸ Building the gridworld & placing goal / traps…
▸ Initializing the Q value table to 0
▸ Loading the ε-greedy policy (explore ↔ exploit)
▸ Setting the Bellman update: Q ← Q + α[r + γ·maxQ′ − Q]
▸ Seeding the deterministic RNG (mulberry32)…
▸ Ready — Online. ✅