April 26, 2026May 3, 2026 admin

Quick review of Marathon Demo @ Google Next

AI Agents

Google recently open-sourced the code behind the multi-agent marathon simulation shown at the developer keynote at Google Cloud Next’ 26 last week (kudos to the team). The repository, Race Condition, is a deployable reference architecture for building autonomous systems using Gemini and the Google Agent Development Kit (ADK).

If you’re currently trying to wrangle multiple LLMs into a coordinated system, getting past the prototype phase is usually where the pain starts. Google built this repo to show how to handle state routing, frontend integration, and API costs in a more production-ready way.

Here is a technical breakdown of how the repository is structured:

Architecture

The Hub (Go): A WebSocket gateway that sits at the center of the architecture, handling state management and message routing between agents.

The Agents (Python): Built with the ADK and powered by Gemini, these handle the actual decision-making and environment processing.

The Frontend (Angular & Three.js): Consumes the WebSocket streams to visualize the agent interactions and simulation state in real-time.

Communication Protocols

Handling unpredictable LLM outputs on a frontend is notoriously difficult. We split the communication into two patterns:

A2A (Agent-to-Agent): The internal message-passing backbone for state sharing and coordination.

A2UI (Agent-to-UI): A server-driven UI approach. Instead of the frontend trying to parse raw text into a layout, the agents emit specific UI primitives (cards, routes, buttons) over the wire. The frontend just renders what it receives.

Key Engineering Patterns

The Planner Ladder: Rather than using complex feature flags, we built a literal progression of agents (planner, planner_with_eval, and planner_with_memory backed by AlloyDB). You can diff the code between them to see exactly how to implement evaluation gating and persistent memory.

Thundering Herd Mitigation: The Hub implements message batching. If you have hundreds of runner agents trying to broadcast state on the exact same tick, this prevents the system from taking itself down.

Replay & Deterministic Testing: Building with LLMs gets expensive fast. The repo includes the ability to replay recorded NDJSON streams (which the frontend treats as live traffic) and a runner_autopilot variant that mocks LLM decisions with zero API calls. This lets you iterate on the UI and run load tests without burning through API credits.

Deploying to GCP

Locally, the stack spins up via Docker Compose (bundling Redis, Postgres, and a Pub/Sub emulator).

When you’re ready to deploy to Google Cloud, the containers map cleanly to Cloud Run, and the agents can be hosted on the Gemini Enterprise Agent Platform

Setup Hint: To get the map and location tools working, ensure you enable agentregistry.googleapis.com (for MCP server discovery), mapstools.googleapis.com, and places.googleapis.com in your GCP console.

Also, take a look at the AGENTS.md file in the repo—it’s written specifically as context for your IDE’s AI assistant (Copilot or Gemini Code Assist) to help you configure the local and cloud environment faster. I used Gemini-CLI on a CloudShell pointing to the AGENTS.md and deployed it with a few natural language inputs to the cloud (It takes about 30-mins for all the terraform code to complete)

#GoogleCloud #MultiAgentSystems #SoftwareArchitecture #GenAI #Developers #OpenSource

Architecture

Communication Protocols

Key Engineering Patterns

Deploying to GCP

Leave a Reply Cancel reply