AI Agent Papers 2026
← Collections
Proposes a multi-agent observe-analyze-repair loop that uses runtime traces to find and fix bugs in LLM-generated code.
Explores constraining LLM generation with executable schemas and multi-agent roles to produce structurally valid yet creative outputs.
Tests how context format (YAML, JSON, Markdown) affects agent accuracy across 9,649 experiments in file-native agentic systems.
Explores training agents to think ahead by distilling environment search into causal reasoning chains in interactive environments.
Investigates teaching agents to ask themselves the right questions before acting to adapt to new situations autonomously.
Surveys the connection between agentic architectures and spatial tasks like robotics and navigation, covering memory, planning, and world models in embodied agents.
Argues for using world models as a bridge between agents and high-cost real-world environments to provide richer learning signals across domains like robotics and ML engineering.
Presents a reference architecture for production AI agents integrating Clean Architecture, event-driven design, per-agent MLOps lifecycles, and human-in-the-loop governance.
Proposes a meta-agent framework that builds, runs, and keeps refining data processing pipelines through hierarchical agent orchestration.
Proposes a multi-agent framework for automatically building executable test environments across ten programming languages using planning-execution-verification with environment reu…
Proposes an adaptive data generation framework for training mobile GUI agents that matches task difficulty to the agent's current capability level.
Proposes extracting dual-form reusable expertise from agent execution histories β€” specialized subagents for procedural tasks and skill patterns for static knowledge β€” with continuo…
Proposes modeling GUI agent operations as sequences of learnable tool tokens with semantic anchoring and curriculum-based training instead of coordinate-based visual grounding.
Proposes a framework combining a self-evolving multi-agent data engine with verifier-based reinforcement learning to train multi-turn interactive tool-using agents.
Investigates why step-wise reasoning struggles with long-horizon planning in LLM agents and proposes future-aware lookahead with reward estimation to let early actions account for …
Proposes a test-time scaling method for software engineering agents that recycles prior trajectories and branches at critical intermediate steps instead of resampling from scratch.
Proposes bundling recurring sequences of agent tool calls into deterministic meta-tools to skip unnecessary intermediate LLM reasoning steps and cut failures.
Explores integrating LLM capabilities into the ASTRA agent programming language to study how traditional agent toolkits and modern LLM-based agentic platforms can inform each other…
Introduces a bi-level framework where a meta-agent evolves context engineering skills via agentic crossover while a base agent executes them to optimize context as files and code.
Proposes a multi-agent framework and benchmark for cross-modal data analysis that coordinates specialized sub-agents via a divide-and-conquer workflow across structured and unstruc…
Explores agentic AI for Android app testing that uses code inspection and dynamic instrumentation to reach activities that standard GUI fuzzers cannot access.
Introduces a large-scale computer-using agent skill library with parameterized execution, composition graphs, dynamic retrieval, and memory-aware failure recovery for desktop appli…
Explores local equilibrium propagation for optimizing deep compound AI systems that avoids signal degradation in long-horizon agentic workflows by replacing global textual backprop…
Investigates counterfactual reasoning in agentic LLM control scenarios using structural causal models and conformal prediction for formal reliability guarantees.
Introduces a hierarchical multi-agent system with out-of-domain detection and BERT-based agent routing for delivering personalized data insights at production scale.
Introduces a system-theoretic framework that decomposes agentic AI into five functional subsystems and derives 12 reusable design patterns for building robust agent architectures.
Explores a pragmatic framework for transitioning organizational processes to agentic AI, covering domain-driven use case identification, task delegation, and human-in-the-loop oper…
Proposes a training-free continual learning framework for LLM agents that retrieves relevant past experiences and modulates output logits at test time without gradient updates.
Proposes embedding explicit reasoning at both function and parameter levels during agent tool calls, with dynamic complexity scoring to trigger granular justification for critical …
Investigates which RL training environment properties and modeling choices most influence cross-domain generalization for LLM agents deployed beyond their training domains.
Proposes disaggregating LLM investigation into bounded local evidence mining with deterministic graph traversal and belief propagation for reliable open-ended agent reasoning.
Presents a LangGraph-based AI agent framework combining GraphRAG, multi-stage retrieval, and RL-inspired adaptive feedback for reverse-engineering legacy scientific code.
Proposes a continuous vulnerability repair system that orchestrates a diverse LLM agent ensemble with two-phase deduplication for integration with continuous fuzzing pipelines.
Introduces a declarative architectural layer for agentic workflows with formalized capabilities, declarative discovery protocol, and deterministic task graph construction.
Presents a task-aware context pruning framework for coding agents that trains a lightweight neural skimmer to selectively retain relevant code lines based on explicit goals.
Proposes a multi-agent prompt optimization framework guided by requirements engineering principles for system and user prompts in agent-based software development.
Introduces a self-evolving multi-agent framework for automated environment configuration with expert diagnosis and dynamic error-fixing priority adjustment.
Proposes a pipeline-aware caching architecture for agentic systems that elevates structured intermediate reasoning representations to first-class cacheable artifacts to reduce redu…
Investigates imposing explicit dynamical structure on an external affective state to induce temporal coherence and controlled recovery in multi-turn dialogue agents.
Proposes a Dual-Process framework that transforms verbalized uncertainty into bi-directional control signals for agent memory and reflection to prevent cascading hallucination erro…
Presents a Unified Agent Lifecycle Management blueprint with five control-plane layers for governing agent fleets including identity registry, orchestration, and runtime policy enf…
Introduces a neuro-symbolic architecture that integrates LLM agents with predicate-logic programming and knowledge graphs to orchestrate end-to-end business initiatives through tas…
Proposes a software engineering framework for capturing and embedding codified human domain knowledge into LLM-based agents through request classification, RAG, and expert rule int…
Defines the agent:// URI scheme that decouples agent identity from network location through trust roots, hierarchical capability paths, and cryptographic attestation for multi-agen…
Surveys efficiency in agent systems across memory, tool learning, and planning, comparing approaches under fixed cost budgets and analyzing the Pareto frontier between effectivenes…
Proposes self-coding information systems that use agentic AI to dynamically generate, test, and redeploy their own source code at runtime to reduce feature delivery time.
Presents a lightweight open-source Python framework for building LLM-driven agents with composable skill abstractions, a unified LLM backend interface, and declarative YAML-based c…
Introduces a multi-agent reward model system for GUI agents that combines domain-specific and general-purpose reward models with automated data reflux for self-evolving agent train…
Investigates three deployment architectures for integrating LLM-based agentic AI with edge computing in UAV swarms, covering standalone, edge-enabled, and edge-cloud hybrid configu…
Proposes a unified taxonomy decomposing AI agents into Perception, Brain, Planning, Action, Tool Use, and Collaboration subsystems, covering MCP, native computer use, and evaluatio…
Surveys agentic reasoning across foundational, self-evolving, and collective multi-agent dimensions, distinguishing in-context reasoning from post-training approaches across planni…
Introduces a governed orchestration framework that treats agentic automation as typed plan synthesis with DAG-based planning, rubric-guided selection, validator-gated execution, an…
Explores how the Unix 'everything is a file' principle informs agentic AI design through file-like abstractions and code-based specifications for composable, auditable agent interf…
Introduces a hierarchical self-evolving multi-agent framework that integrates curriculum learning, reward-based learning, and genetic algorithm evolution for continuous autonomous …
Proposes a self-evolving agent framework that evolves an explicit finite state machine instead of free-form code rewriting, constraining flow and skill optimization to a structured…
Identifies and studies a conflict type where a tool-augmented LLM's internal knowledge contradicts external tool outputs, evaluating whether existing resolution techniques like pro…
Uses lookahead planning to estimate the value of tool usage at each step and selects stable, high-value reasoning paths, with a convergence mechanism that halts rollouts once consi…
Trains history-aware routers for large-scale MCP tool ecosystems using dependency graphs and multi-turn trajectory synthesis to generalize across multi-agent collaboration and mass…
Proposes iterative query planning for tool retrieval that decomposes instructions into sub-tasks and dynamically generates queries, trained via synthetic trajectories and reinforce…
Introduces a Computer-Using Agent framework with milestone-driven long-term memory for trajectory-level self-correction and a multimodal searcher that synthesizes live, visually al…
Presents a conversational AI interface for dynamic tool discovery and execution via the OPACA framework, comparing multiple task-solving strategies across different agent setups an…
Proposes test-time tool evolution where agents synthesize, verify, and evolve executable tools during inference instead of relying on static pre-defined tool libraries.
Introduces a large-scale distributed orchestration system that decouples agent training into independent Model, Agent, and Environment services for scheduling tens of thousands of …
Proposes an evaluation-judge-optimization pipeline that assigns block-level responsibility scores to failing logic blocks in agentic workflows, focusing modifications on the most p…
Introduces a reproducibility-constrained framework for Large Action Models with structured action schemas, deterministic execution policies, and provenance tracking to ensure audit…
Proposes a composable RL infrastructure for LLM agents that separates algorithm design, execution, and agent-environment interaction with a centralized scheduler for managing share…
Introduces activation-guided, role-conditioned neuron transplantation for training-free merging of environment-specific LLM agent experts into a single generalist model.
Proposes a dynamics-aware framework grounded in Schema Theory that routes agent training data to SFT or RL based on gradient concentration, using cognitive conflict as the allocati…
Introduces a training framework for calibrating agent tool-use behavior through a self-evolving data flywheel and two-phase behavior calibration to reduce redundant and insufficien…
Proposes a co-evolutionary framework that jointly optimizes the agent policy and its natural-language critic through synchronized GRPO updates, preventing the critic from becoming …
Introduces context engineering techniques for agentic workflows including structured DS-specific prompting, separate plan and code agents, and smart history rendering for fault tol…
Proposes a reinforcement learning paradigm that replaces pointwise scalar scoring with intra-group relative ranking via tournament-based schemes to address discrimination collapse …
Introduces a conceptual framework with six capabilities (Contextualize, Harmonize, Anticipate, Negotiate, Generate, Evolve) for architecting AgentOps platforms that manage the life…
Proposes internalizing execution priors to predict agent outcomes before physical execution, using a Predict-then-Verify loop to accelerate ML agent workflows without running expen…
Proposes an automated framework for generating scalable tool-interaction environments via programmatic synthesis, constructing diverse environment skeletons and task scenarios for …
Proposes a multi-agent framework for localizing integration defects in LLM-integrated software using code knowledge graphs enriched with LLM-aware annotations and counterfactual re…
Proposes a unified framework for multi-turn agentic RL that uses a turn-level tree structure for entropy-guided exploration, turn-wise credit assignment, and turn-based policy opti…
Proposes a framework that decouples agentic search into Search Behavior Agents and Knowledge Management Agents with turn-level rewards for multi-hop QA.
Reframes agent self-improvement as a release engineering pipeline with implementation-blind quality signals, symptom-level diagnosis, and flip-centered regression gating.
Proposes an attribution-driven requirements engineering methodology for specifying what domain knowledge LLM agents need at design time, organized along four causal dimensions.
Proposes a structured generation engine for agentic LLMs with dynamic tag dispatching, JIT compilation, and cross-grammar caching for tool calling and conditional structured genera…
Formalizes transitive expert error in AI routing architectures including MoE, multi-model orchestration, and tool-using agents, proposing boundary-aware calibration and coverage ga…
Introduces a multi-agent workflow for synthesizing research-grade training data with a two-stage SFT plus agentic RL strategy for open-source deep research models.
Proposes design patterns for architecting agentic communities derived from enterprise distributed systems standards, covering coordination, governance, and formal collaboration agr…
Proposes a skill-conditioned RL framework for tool-using agents that grounds reward modeling in a library of skill prototypes for mid-level credit assignment.
Proposes a Context-Aware MCP architecture with a Shared Context Store that enables MCP servers to coordinate autonomously by reading from and writing to shared context memory.
Proposes a multi-agentic workflow that decouples optimization of primary task descriptions from constraint optimization using quantitative feedback for iterative prompt refinement.
Proposes a general-purpose agent framework that keeps reasoning context bounded regardless of task duration by externalizing persistent state into a file-centric state abstraction.
Surveys agentic AI architectures covering planning, memory, tool use, and iterative reasoning with a critical assessment of safety, alignment, and reliability challenges.
Proposes an agentic memory enhanced recursive reasoning framework for root cause localization with cross-alert memory reuse and multi-agent recursive refinement.
Introduces a lightweight Python framework providing a unified, type-safe interface for building LLM agents across multiple providers with tool calling, memory management, and MCP i…
Surveys AI agent architectures spanning reasoning, planning, tool calling, orchestration patterns, and deployment settings with a unified taxonomy of agent components and design tr…
Proposes a dual-stream architecture that elevates the persistent Python runtime as the central locus of agent state, with stateful runtime management and skill injection for long-h…
Proposes an active feedback model where AI agents proactively interact with the environment to discover and verify feedback without relying on predefined measurements.
Proposes an asynchronous architecture for million-agent scaling that reduces memory complexity via singleton weight sharing and topological synapse-inspired KV-cache sparsification…
Anthropic μ—”μ§€λ‹ˆμ–΄λ§ λΈ”λ‘œκ·Έ. ν”„λ‘œλ•μ…˜ μ‹€λ¬΄μž κ΄€μ μ—μ„œ μ›Œν¬ν”Œλ‘œ vs μ—μ΄μ „νŠΈ, μ˜€μΌ€μŠ€νŠΈλ ˆμ΄μ…˜ νŒ¨ν„΄ λ“± μ–΄νœ˜μ™€ 직관을 μž‘μ•„μ£ΌλŠ” μ›Œλ°μ—… 자료. λ…Όλ¬Έ 읽기 전에 λ¨Όμ € λ³Ό 것.
Agent Tooling blog/anthropic/build notes β†’ πŸ’¬ Tier 0 β€” λ…Όλ¬Έ μ•„λ‹˜. μ—μ΄μ „νŠΈ κ΄€λ ¨ λ…Όλ¬Έ 읽기 μ „ μ›Œλ°μ—…. μ§§κ³  κ·Έλ¦Ό μœ„μ£Ό. 싀무 …