“A game is a series of interesting choices.” Sid Meier

We build Play2Code that turns game generation into a continual loop between a code-writing agent and a GUI playtester within PlaytestArena, where each game prompt is paired with rubrics for expected behavior. Each build is experienced through the browser, evaluated by the GUI agent, and revised in the next round.

From Inspection to Interaction

Quality lives in the moment of play.

Source code can compile while the game remains broken: a button may not respond, a sprite may fail to render, or a win condition may never trigger. Play2Code treats the browser surface as the place where game quality becomes visible.

01

Generate

The game agent writes or patches a self-contained HTML game in a shared runtime.

02

Play

The GUI agent observes the rendered screen and acts through clicks and keys.

03

Report

Play trajectories become a summary and a concrete fix list for the next revision.

04

Remember

Episode, skill, and world memory accumulate experience across rounds and tasks.

PlaytestArena

The environment makes quality observable.

PlaytestArena frames each game generation task as a playable evaluation setting: a prompt defines intent, rubrics define observable criteria, and the GUI agent evaluates the result through interaction.

Game Promptthe intended game, mechanics, and player experience

Rubricscriterion-level checks with expected in-game behavior

GUI as Evaluatorobjective playtesting at the same screen surface as a player

Play2Code

A sustained dialogue between coding and playing.

Adapted from the Play2Code overview: generation, playtesting, and memory form one refinement system rather than separate evaluation steps.

Play2Code architecture diagram A game agent and GUI agent iterate through generate, play and report, and revise steps with episode, skill, and world memory. CONTINUAL GAME GENERATION LOOP Game Agent Generate write, debug, and patch code Shared browser runtime HTML Game rendered as a playable build GUI playtester Play & Report observe, reason, click, and type revise play observation and action summary + actionable fix list Memory system Episode Skill World Shared Space