Research note 研究笔记 Local ICML 2026 source May 4, 2026 2026年5月4日

MAS-Architect: Designing Agents That Redesign Themselves MAS-Architect:会设计自身协作结构的智能体

MAS-Architect: Declarative Multi-Agent System Design via Separation of Concerns

Jing Huang, Lidong Zhang, Mutian Bao, Yadong Li, Xingzhong Xu, Jinjian Zhang, Jie Liu, Ming Kong, Qiang Zhu

MAS-Architect reframes Auto-MAS as architectural generation rather than template selection. Its key move is a code-based declarative MAS paradigm: topology planning says what the collaboration graph is, node implementation says how each agent executes, and a shared State Schema keeps the two layers coupled only through typed state. With Distill-then-Explore training, the Meta-Agent learns to synthesize query-specific multi-agent systems from scratch, reaching 78.7% average accuracy while establishing a stronger efficiency-performance frontier. MAS-Architect 将 Auto-MAS 从“选择或改造模板”改写为“生成协作架构”。它的核心是 一种代码化的声明式 MAS 范式:Topology Planning 描述协作图是什么,Node Implementation 描述每个 agent 如何执行,二者只通过统一的 State Schema 传递 typed state。通过 Distill-then-Explore 训练,Meta-Agent 学会从 query 出发生成专属的多智能体系统, 在五个 benchmark 上达到 78.7% 平均准确率,并形成更好的性能-效率前沿。

Comparison of graph-based, imperative code-based, and declarative MAS representations
Representation comparison: declarative MAS keeps graph clarity while recovering code-level expressiveness through separated layers. 表示范式对比:声明式 MAS 保留图结构的清晰性,同时通过分层恢复代码级表达力。

In Brief 简要结论

  1. 1

    The paper's central design choice is Separation of Concerns: topology, node behavior, and runtime state are made explicit instead of being entangled in one imperative code stream. 论文最核心的设计选择是 Separation of Concerns:把拓扑、节点行为和运行时状态显式拆开,而不是缠在一段命令式代码里。

  2. 2

    MAS-Architect generates task-adaptive MAS from scratch, deciding node count, connections, conditional routing, roles, tools, and reasoning patterns at query level. MAS-Architect 从任务 query 出发生成自适应 MAS,动态决定节点数量、连接、条件路由、角色、工具和推理模式。

  3. 3

    Distill-then-Explore gives the Meta-Agent a warm start from validated teacher architectures, then lets RL discover architectures beyond imitation. Distill-then-Explore 先用经过执行验证的 teacher 架构给 Meta-Agent 热启动,再用 RL 探索超越模仿的新架构。

  4. 4

    The strongest Lifelong Agents signal is emergent organization: parallel search streams, recursive audit loops, and role-specialized agents appear without fixed templates. 对 Lifelong Agents 最重要的信号是组织结构会涌现:并行搜索流、递归审计环、角色特化 agent 都不是固定模板硬塞出来的。

Problem 问题

Existing Auto-MAS methods live between two unsatisfying extremes. Graph-based methods are readable and verifiable, but are often stuck in DAGs, predefined roles, operator pools, and static routing. Imperative MAS-as-code methods are expressive, but topology, control flow, prompts, tools, and state passing become tangled inside procedural code. The paper argues that this coupling makes architectures hard to search, inspect, optimize, and adapt per query. 现有 Auto-MAS 方法处在两个不够理想的极端之间。图结构方法可读、可验证,但常常受限于 DAG、预定义角色、operator pool 和静态路由;命令式 MAS-as-code 方法表达力强,但拓扑、 控制流、prompt、工具和状态传递都混在过程代码里。论文认为,这种耦合会让架构难以搜索、 观察、优化,也难以针对每个 query 做自适应。

MAS-Architect framework with Distill-then-Explore training and query-specific generation
Framework view: a Meta-Agent plans topology and implements nodes, trained first by distilled validated architectures and then by RL exploration. 框架视图:Meta-Agent 先规划拓扑、再实现节点;训练上先蒸馏经过验证的架构,再通过 RL 探索。

Declarative MAS 声明式 MAS

MAS-Architect introduces a declarative representation with two layers. The Topology Layer declares the graph, dynamic branches, loops, and routing rules - what the architecture is. The Implementation Layer realizes each node - how an agent reasons, acts, calls tools, and updates state. A standardized State Schema becomes the interface between them, carrying task context, intermediate results, and trajectories without hiding topology inside execution code. MAS-Architect 提出双层声明式表示。Topology Layer 声明图结构、动态分支、循环和路由规则, 回答“架构是什么”;Implementation Layer 实现每个节点,回答“agent 如何推理、行动、调用工具、 更新状态”。统一的 State Schema 是两层之间的接口,承载任务上下文、中间结果和交互轨迹, 避免拓扑被埋进执行代码里。

Efficiency analysis on GSM8K comparing accuracy and token cost
Efficiency plot: MAS-Architect reaches the strongest GSM8K accuracy while sitting at the low-token end of the frontier. 效率图:MAS-Architect 在 GSM8K 上达到最高准确率,同时位于低 token 成本前沿。

Training 训练

The training pipeline first distills architectural patterns from a large teacher model. Candidate MAS specifications are compiled and executed, and only successful, correct architectures survive execution-based rejection sampling for SFT. The second stage uses RL with verified rewards: invalid or shortcut-like architectures get zero reward, valid but incorrect attempts receive a small base reward, and correct valid executions receive full reward. This makes exploration target architecture quality rather than surface-form imitation. 训练流程先从大 teacher 模型蒸馏架构模式。候选 MAS specification 会被编译和执行,只有能运行且 回答正确的架构通过 execution-based rejection sampling 进入 SFT。第二阶段使用带验证奖励的 RL: 无效或疑似 shortcut 的架构得 0 分,有效但答错的尝试有少量基础奖励,正确且有效的执行获得满分。 这样探索优化的是架构质量,而不是代码表面形式的模仿。

Cross-model comparison on MATH and GPQA
Cross-model evidence: the generated MAS helps Qwen3, Qwen2.5, and Llama model families, suggesting architecture search is not tied to one driver model. 跨模型证据:生成的 MAS 对 Qwen3、Qwen2.5 和 Llama 系列都有帮助,说明架构搜索并不绑死在某个 driver model 上。

Evidence 证据

On GSM8K, GSM-Hard, MATH, MMLU, and GPQA, MAS-Architect with Qwen3-4B reaches 78.7% average accuracy, 6.4 points above vanilla and 3.0 points above the second-best method in the table. The GSM8K efficiency plot is especially telling: 94.4% accuracy with 2,533 prompt tokens per query, and the appendix reports 23.8% fewer total tokens than MAS-GPT on GSM8K. Ablations show that SFT gives small stable gains, while RL is the main unlocker, especially on HotpotQA where Qwen3-4B rises by 16.0 points. 在 GSM8K、GSM-Hard、MATH、MMLU、GPQA 上,Qwen3-4B 驱动的 MAS-Architect 平均准确率 达到 78.7%,比 vanilla 高 6.4 个点,也比表中第二名高 3.0 个点。GSM8K 的效率图尤其关键: 它以 2,533 prompt tokens/query 达到 94.4% 准确率,附录还报告其总 token 比 MAS-GPT 少 23.8%。消融结果显示,SFT 带来小而稳定的提升,RL 才是主要解锁器,尤其在 HotpotQA 上让 Qwen3-4B 提升 16.0 个点。

Query-specific architectures for HotpotQA and MMLU
Emergent architectures: HotpotQA induces parallel ReAct search streams, while MMLU induces hierarchical dual-loop refinement. 涌现架构:HotpotQA 诱导并行 ReAct 搜索流,MMLU 诱导分层双循环 refinement。

Why Lifelong Agents 为什么属于 Lifelong Agents

The Lifelong Agents angle is not simply that agents collaborate. It is that the Meta-Agent learns an architectural policy: given a new task, it can create a new organization, assign roles, route information, audit failures, and specialize agents. That is a step toward agents whose competence grows at the level of system design, not only at the level of single-model answers. 它属于 Lifelong Agents,不只是因为多个 agent 在协作,而是因为 Meta-Agent 学到的是一种 “架构策略”:面对新任务,它可以创建新的组织结构、分配角色、路由信息、审计失败并进行角色特化。 这让智能体的成长发生在系统设计层,而不只是单模型回答层。

Recursive audit chain with critic and router feedback loop
Recursive Audit Chain: the architecture routes solver output through critique before final submission, creating a self-correction loop. 递归审计链:架构让 Solver 输出先经过 Critic 审核再提交,形成自我修正环。
Conditional parallelism with decomposer, researchers, and synthesizer
Conditional Parallelism: independent evidence paths are fanned out and synthesized, a useful pattern for multi-hop retrieval. 条件并行:独立证据路径被拆分并并行研究,最后综合,适合多跳检索任务。
Domain-specific role adaptation in generated MAS
Domain-Specific Adaptation: the model creates semantic roles such as Location Investigator and Year Verifier instead of generic workers. 领域特化适配:模型创建 Location Investigator、Year Verifier 等语义角色,而不是泛泛的 worker。