All posts

ICML 2026 Review: Five Accepted Papers from Our Team on Building Better Agents

Bill Xu

• Published on May 5, 2026 • 39min read

We are excited to share that our team has five papers accepted to ICML 2026.

The International Conference on Machine Learning (ICML) is one of the most influential international conferences in machine learning, alongside NeurIPS and ICLR. It brings together leading research in machine learning, artificial intelligence, optimization, representation learning, reinforcement learning, and agent systems.

According to the official acceptance information, ICML 2026 received 23,918 submissions for review and accepted 6,352 papers, with an overall acceptance rate of 26.6%. The conference will be held in Seoul, South Korea, from July 6 to July 11, 2026.

This year, our team has five accepted papers at ICML 2026. These works study different but closely related problems in building better agents: user-centric interaction, research ideation, sub-agent orchestration, ambiguous-query search evaluation, and verifiable web-agent environments.

The accepted works were completed by members of our research team in collaboration with academic and industry partners, including [INSTITUTION / UNIVERSITY / COMPANY NAMES TO BE CONFIRMED].

Together, these five papers reflect a shared research direction: agents should not be fixed prompt chains. They should be adaptive systems that can ask useful questions, reason through open-ended problems, create specialized sub-agents, handle ambiguous user intent, and operate in environments where success can be verified.

1. AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration

Paper: AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration
Authors: Jianhao Ruan, Zhihao Xu, Yiran Peng, Fashen Ren, Zhaoyang Yu, Xinbing Liang, Jinyu Xiang, Yongru Chen, Bang Liu, Chenglin Wu, Yuyu Luo, Jiayi Zhang
Institutions / Companies: DeepWisdom, The Hong Kong University of Science and Technology (Guangzhou), Renmin University of China, East China Normal University, Université de Montréal & Mila – Quebec AI Institute

AOrchestra focuses on agent orchestration. Complex, long-horizon tasks often cannot be solved well by a single monolithic agent. They require decomposition, specialization, tool use, and coordination.

Existing multi-agent systems often rely on manually predefined roles or static sub-agents. A developer decides in advance what agents exist, what tools each agent can use, and how they should interact. This limits adaptability because real tasks vary widely.

AOrchestra creates task-specific sub-agents on demand by specifying instruction, context, tools, and model for each delegated task.

AOrchestra introduces a unified abstraction for agents. It models any agent as a four-tuple:

Instruction;
Context;
Tools;
Model.

This four-tuple acts as a compositional recipe for creating specialized sub-agents. Instead of using a fixed set of roles, AOrchestra allows a central orchestrator to create task-specific sub-agents on demand. For each subtask, the orchestrator decides what instruction to give, what context to pass, which tools to expose, and which model to use.

This design separates orchestration from execution. The orchestrator does not directly solve every task; it creates the right executor for each step.

Across GAIA, Terminal-Bench 2.0, and SWE-Bench-Verified, AOrchestra improves over representative agentic baselines. The paper also shows that orchestration can be learned through supervised fine-tuning and cost-aware in-context optimization.

In short: AOrchestra studies how agent systems can dynamically create specialized sub-agents instead of relying on fixed roles.

2. AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines

Paper: AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines
Authors: Yifan Wu, Yiran Peng, Yiyu Chen, Jianhao Ruan, Zijie Zhuang, Cheng Yang, Jiayi Zhang, Man Chen, Yenchi Tseng, Zhaoyang Yu, Liang Chen, Yuyao Zhai, Bang Liu, Chenglin Wu, Yuyu Luo
Institutions / Companies: The Hong Kong University of Science and Technology (Guangzhou); DeepWisdom; Peking University; Université de Montréal & Mila

AutoWebWorld addresses a key bottleneck in training and evaluating web GUI agents: high-quality web interaction trajectories are expensive to collect and difficult to verify.

When agents interact with real websites, the true internal state of the website is usually hidden. The agent sees only rendered UI feedback, such as screenshots. This makes it difficult to know whether each step is correct. Existing pipelines often depend on human annotators or LLM judges, which are costly and inconsistent.

AutoWebWorld proposes a framework for synthesizing controllable and verifiable web environments using finite state machines. Each environment explicitly defines states, actions, transition rules, and goal conditions. Because the state transitions are known, task success can be verified programmatically rather than inferred from surface observations.

AutoWebWorld generates finite-state web environments, translates them into executable websites, and collects verified GUI trajectories through search and replay.

The framework uses coding agents to translate FSM specifications into interactive websites, then performs breadth-first search over the known transition graph to generate candidate trajectories. These trajectories are replayed and filtered through execution, producing verified GUI trajectories.

AutoWebWorld generates 29 diverse web environments and over 11,663 verified trajectories at a cost of only $0.04 per trajectory. Training on this synthetic data improves real-world web-agent performance. The paper also observes a scaling trend: as the amount of synthetic data increases, performance on WebVoyager and Online-Mind2Web improves consistently.

In short: AutoWebWorld creates scalable, controllable, and intrinsically verifiable web environments for training and evaluating web agents.

3. InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Paper: InfoPO: Information-Driven Policy Optimization for User-Centric Agents
Authors: Fanqi Kong, Jiayi Zhang, Mingyi Deng, Chenglin Wu, Yuyu Luo, Bang Liu
Institutions / Companies: Peking University, DeepWisdom, The Hong Kong University of Science and Technology (Guangzhou), Université de Montréal & Mila

InfoPO studies a fundamental problem in user-facing agents: real user requests are often underspecified. A user may provide an incomplete goal, omit key constraints, or express a preference only implicitly. In these cases, the agent should not immediately act on incomplete information. It should ask questions that reveal useful information for future decisions.

Most multi-turn reinforcement learning methods optimize agents using trajectory-level rewards. This creates a credit-assignment problem: if an agent eventually succeeds or fails, it is difficult to know which earlier interaction turns were actually useful.

InfoPO addresses this by introducing a turn-level counterfactual information-gain reward. The method measures how much a user’s feedback changes the agent’s next action distribution compared with a masked-feedback counterfactual. In other words, it asks whether a specific interaction turn produced information that meaningfully changed what the agent would do next.

InfoPO computes turn-level information gain by comparing factual feedback with a masked-feedback counterfactual, then combines this signal with outcome rewards for multi-turn policy optimization.

The paper further combines this information-gain signal with outcome-based rewards through an adaptive variance-gated fusion mechanism. This allows the agent to learn from useful intermediate interactions while still optimizing for final task success.

Across benchmarks including UserGym, ColBench, and τ²-Bench, InfoPO improves performance over prompting and multi-turn RL baselines. It also shows robustness under user-simulator shifts and generalization to environment-interactive tasks.

In short: InfoPO helps user-centric agents learn when and how to ask useful questions, rather than optimizing only for final answers.

4. InteractComp: Evaluating Search Agents With Ambiguous Queries

Paper: InteractComp: Evaluating Search Agents With Ambiguous Queries
Authors: Mingyi Deng, Lijun Huang, Yani Fan, Jiayi Zhang, Fashen Ren, Jinyi Bai, Fuzhen Yang, Dayi Miao, Zhaoyang Yu, Yifan Wu, Yanfei Zhang, Fengwei Teng, Yingjia Wan, Song Hu, Yude Li, Xin Jin, Conghao Hu, Haoyu Li, Qirui Fu, Tai Zhong, Xinyu Wang, Xiangru Tang, Nan Tang, Chenglin Wu, Yuyu Luo
Institutions / Companies: DeepWisdom; The Hong Kong University of Science and Technology (Guangzhou); Renmin University of China; University of California, Los Angeles; Agent Universe; McGill University; Yale University

InteractComp studies a practical weakness in current search agents: they often assume that user queries are complete and unambiguous. In real use, this assumption is frequently false.

Users often begin with short, incomplete, or ambiguous queries. A search agent may need to ask a clarification question before it can retrieve the right information. However, most search benchmarks evaluate agents as if the user’s intent were already fully specified.

InteractComp introduces a benchmark for evaluating whether search agents can recognize ambiguity and interact to resolve it. The benchmark contains 210 expert-curated questions across 9 domains, constructed through a target-distractor methodology. Each question is designed so that ambiguity cannot be resolved reliably without interaction.

InteractComp constructs ambiguous search queries through target-distractor pairs and evaluates whether agents interact to disambiguate user intent.

The evaluation reveals a sharp gap. The best model achieves only 13.73% accuracy under ambiguous queries, even though it reaches 71.50% accuracy when given complete context. This suggests that the problem is not simply lack of knowledge or reasoning ability. Instead, many agents are overconfident and fail to ask for clarification when they should.

The paper also shows that forced interaction can dramatically improve performance, indicating that current models may have latent interactive capability that their default strategies fail to use.

In short: InteractComp evaluates whether search agents can detect ambiguous user intent and ask useful clarification questions during search.

5. MindFlow: Mind Supernet Powered Thinking Flows for Research Idea Innovation

Paper: MindFlow: Mind Supernet Powered Thinking Flows for Research Idea Innovation
Authors: Mengdi Liu, Wenjue Chen, Wenyue Chen, Cheng Yang, Fanqi Kong, Zhangyang Gao, Xiaoxue Cheng, Yiheng Li, Yujian Yuan, Keliang Li, Hong Chang, Shiguang Shan, Chenglin Wu
Institutions / Companies: Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, DeepWisdom, Peking University, Shanghai Artificial Intelligence Laboratory, Renmin University of China

MindFlow studies research idea generation, a task that is both open-ended and multi-objective. A good research idea must identify a meaningful problem, propose a plausible method, and balance novelty, significance, effectiveness, and feasibility.

Existing LLM-based ideation systems often rely on fixed prompting pipelines or manually designed agent workflows. These pipelines can work in narrow settings, but they are not flexible enough for different topics, domains, or evaluation objectives.

MindFlow proposes a different view: the thinking process itself should be explicit, controllable, and optimizable. It formulates research ideation as a graph-structured thinking flow composed of modular thinking operators, including divergent thinking, convergent thinking, critical thinking, analogical thinking, counterfactual thinking, and constraint-driven thinking.

MindFlow samples topic-specific thinking flows from a probabilistic mind supernet and optimizes them through tournament-based relative ranking.

These thinking flows are modeled by a probabilistic mind supernet. Given a research topic, a controller dynamically samples a topic-specific flow and executes it to generate candidate research ideas. The controller is optimized through tournament-based relative ranking, which provides comparative feedback for open-ended idea generation.

The paper also introduces an evaluation protocol that jointly assesses problem finding and problem solving. Experiments show that MindFlow improves idea quality across multiple dimensions, including novelty, diversity, effectiveness, and feasibility.

In short: MindFlow turns research ideation from a fixed prompt pipeline into an optimizable process over structured thinking flows.

A Shared Research Direction

Although these five papers study different problems, they are connected by a common goal: building better agent systems.

Each paper addresses a different layer of the agent stack:

Paper	Main Problem	Agent Capability
InfoPO	How should agents learn from user interaction?	User-centric interaction
MindFlow	How should agents generate open-ended ideas?	Structured reasoning and ideation
AOrchestra	How should agents coordinate complex tasks?	Dynamic sub-agent orchestration
InteractComp	How should search agents handle ambiguity?	Interactive search evaluation
AutoWebWorld	How should web agents be trained and evaluated?	Verifiable web environments

Together, these works suggest that future agents need more than stronger language models. They need better interaction policies, better reasoning processes, better orchestration mechanisms, better evaluation benchmarks, and better environments for training and verification.

In this view, an agent is not just a model that responds to prompts. It is a system that can manage uncertainty, gather information, decompose work, select tools, verify progress, and adapt its behavior to the task.

Looking Ahead

Having five papers accepted to ICML 2026 is an important milestone for our team.

More importantly, these papers reflect the direction we are pursuing: building agents that are adaptive, interactive, modular, and evaluatable.

We are grateful to all researchers, collaborators, institutions, universities, and industry partners involved in these works. We look forward to sharing more details at ICML 2026 in Seoul.

Contents