Generate
The game agent writes or patches a self-contained HTML game in a shared runtime.
“A game is a series of interesting choices.” Sid Meier
We build Play2Code that turns game generation into a continual loop between a code-writing agent and a GUI playtester within PlaytestArena, where each game prompt is paired with rubrics for expected behavior. Each build is experienced through the browser, evaluated by the GUI agent, and revised in the next round.
From Inspection to Interaction
Source code can compile while the game remains broken: a button may not respond, a sprite may fail to render, or a win condition may never trigger. Play2Code treats the browser surface as the place where game quality becomes visible.
The game agent writes or patches a self-contained HTML game in a shared runtime.
The GUI agent observes the rendered screen and acts through clicks and keys.
Play trajectories become a summary and a concrete fix list for the next revision.
Episode, skill, and world memory accumulate experience across rounds and tasks.
PlaytestArena
PlaytestArena frames each game generation task as a playable evaluation setting: a prompt defines intent, rubrics define observable criteria, and the GUI agent evaluates the result through interaction.
Game Promptthe intended game, mechanics, and player experience
Rubricscriterion-level checks with expected in-game behavior
GUI as Evaluatorobjective playtesting at the same screen surface as a player
Play2Code
Adapted from the Play2Code overview: generation, playtesting, and memory form one refinement system rather than separate evaluation steps.
Playable Gallery
Each demo is embedded as a playable browser artifact, reflecting the paper’s premise that games should be evaluated through interaction.