The Problem with Talkative AI Systems
Multi-agent AI systems, where multiple language models work together on complex tasks, face a fundamental bottleneck: they communicate by generating text. Each agent must spell out its reasoning token by token before the next agent can begin processing. This sequential text generation creates latency, drives up token costs, and makes it difficult to train the entire system as a cohesive unit. Traditional approaches either rely on prompt based adaptation, which keeps each agent underlying capabilities static, or require the computationally expensive process of updating all parameters across multiple models.
How RecursiveMAS Rethinks Agent Communication
Researchers at the University of Illinois Urbana-Champaign and Stanford University developed RecursiveMAS, a framework that enables agents to collaborate through embedding space instead of text. The architecture treats the multi-agent system as a single integrated unit, inspired by recursive language models. Each agent acts like a layer in a recursive model, passing continuous latent representations to the next agent rather than generating text. A specialized component called RecursiveLink transmits and refines these hidden states between agents, even when agents use different model architectures with incompatible embedding dimensions.
Measured Gains in Speed and Cost
In tests across nine benchmarks covering mathematics, code generation, and medical reasoning, RecursiveMAS achieved an average accuracy improvement of 8.3% compared to the strongest baselines. It delivered 1.2x to 2.4x faster inference and reduced token usage by as much as 75.6% compared to text based multi-agent frameworks. Because only the lightweight RecursiveLink modules are trained roughly 13 million parameters or about 0.31% of the frozen models trainable parameters the system cuts training costs by more than half compared to full fine tuning. The researchers have released the code and model weights under the Apache 2.0 license.
Source: VentureBeat