Research note Local ICML 2026 source May 4, 2026

MAS-Architect: Designing Agents That Redesign Themselves

MAS-Architect: Declarative Multi-Agent System Design via Separation of Concerns

Jing Huang, Lidong Zhang, Mutian Bao, Yadong Li, Xingzhong Xu, Jinjian Zhang, Jie Liu, Ming Kong, Qiang Zhu

MAS-Architect reframes Auto-MAS as architectural generation rather than template selection. Its key move is a code-based declarative MAS paradigm: topology planning says what the collaboration graph is, node implementation says how each agent executes, and a shared State Schema keeps the two layers coupled only through typed state. With Distill-then-Explore training, the Meta-Agent learns to synthesize query-specific multi-agent systems from scratch, reaching 78.7% average accuracy while establishing a stronger efficiency-performance frontier.

Comparison of graph-based, imperative code-based, and declarative MAS representations — Representation comparison: declarative MAS keeps graph clarity while recovering code-level expressiveness through separated layers.

In Brief 简要结论

1
The paper's central design choice is Separation of Concerns: topology, node behavior, and runtime state are made explicit instead of being entangled in one imperative code stream.
2
MAS-Architect generates task-adaptive MAS from scratch, deciding node count, connections, conditional routing, roles, tools, and reasoning patterns at query level.
3
Distill-then-Explore gives the Meta-Agent a warm start from validated teacher architectures, then lets RL discover architectures beyond imitation.
4
The strongest Lifelong Agents signal is emergent organization: parallel search streams, recursive audit loops, and role-specialized agents appear without fixed templates.

Problem 问题

Existing Auto-MAS methods live between two unsatisfying extremes. Graph-based methods are readable and verifiable, but are often stuck in DAGs, predefined roles, operator pools, and static routing. Imperative MAS-as-code methods are expressive, but topology, control flow, prompts, tools, and state passing become tangled inside procedural code. The paper argues that this coupling makes architectures hard to search, inspect, optimize, and adapt per query.

MAS-Architect framework with Distill-then-Explore training and query-specific generation — Framework view: a Meta-Agent plans topology and implements nodes, trained first by distilled validated architectures and then by RL exploration.

Declarative MAS 声明式 MAS

MAS-Architect introduces a declarative representation with two layers. The Topology Layer declares the graph, dynamic branches, loops, and routing rules - what the architecture is. The Implementation Layer realizes each node - how an agent reasons, acts, calls tools, and updates state. A standardized State Schema becomes the interface between them, carrying task context, intermediate results, and trajectories without hiding topology inside execution code.

Efficiency analysis on GSM8K comparing accuracy and token cost — Efficiency plot: MAS-Architect reaches the strongest GSM8K accuracy while sitting at the low-token end of the frontier.

Training 训练

The training pipeline first distills architectural patterns from a large teacher model. Candidate MAS specifications are compiled and executed, and only successful, correct architectures survive execution-based rejection sampling for SFT. The second stage uses RL with verified rewards: invalid or shortcut-like architectures get zero reward, valid but incorrect attempts receive a small base reward, and correct valid executions receive full reward. This makes exploration target architecture quality rather than surface-form imitation.

Cross-model comparison on MATH and GPQA — Cross-model evidence: the generated MAS helps Qwen3, Qwen2.5, and Llama model families, suggesting architecture search is not tied to one driver model.

Evidence 证据

On GSM8K, GSM-Hard, MATH, MMLU, and GPQA, MAS-Architect with Qwen3-4B reaches 78.7% average accuracy, 6.4 points above vanilla and 3.0 points above the second-best method in the table. The GSM8K efficiency plot is especially telling: 94.4% accuracy with 2,533 prompt tokens per query, and the appendix reports 23.8% fewer total tokens than MAS-GPT on GSM8K. Ablations show that SFT gives small stable gains, while RL is the main unlocker, especially on HotpotQA where Qwen3-4B rises by 16.0 points.

Query-specific architectures for HotpotQA and MMLU — Emergent architectures: HotpotQA induces parallel ReAct search streams, while MMLU induces hierarchical dual-loop refinement.

Why Lifelong Agents 为什么属于 Lifelong Agents

The Lifelong Agents angle is not simply that agents collaborate. It is that the Meta-Agent learns an architectural policy: given a new task, it can create a new organization, assign roles, route information, audit failures, and specialize agents. That is a step toward agents whose competence grows at the level of system design, not only at the level of single-model answers.

Recursive audit chain with critic and router feedback loop — Recursive Audit Chain: the architecture routes solver output through critique before final submission, creating a self-correction loop.

Conditional parallelism with decomposer, researchers, and synthesizer — Conditional Parallelism: independent evidence paths are fanned out and synthesized, a useful pattern for multi-hop retrieval.

Domain-specific role adaptation in generated MAS — Domain-Specific Adaptation: the model creates semantic roles such as Location Investigator and Year Verifier instead of generic workers.