Reveals that AI agents produce harmful content (toxic text, exploits, dangerous data) as a side effect of completing normal professional tasks β no adversarial prompting needed. Atβ¦
Trains an LLM to generate RAG poison that survives real-world content processing and query variation for stress-testing RAG defenses.
Analyzes 98K agent skills from community registries to study the prevalence and nature of malicious third-party agent plugins.
Investigates whether attackers can reconstruct knowledge graphs from Graph RAG outputs through multi-turn probing.
Proposes consume-once mandate semantics for AI agent payment protocols to prevent replay and redirect attacks in autonomous transactions.
Explores using an LLM agent to identify attack techniques in stripped malware binaries through incremental context retrieval.
Maps attack paths in agent-to-agent communication protocols for automotive LLM assistants, from driver distraction to unauthorized vehicle control.
Explores using reinforcement learning to auto-generate prompt injection attacks that transfer across multiple frontier LLM models.
Proposes an LLM agent with dual feedback loops for strategy and code to automate vulnerability reproduction from CVE descriptions.
Organizes agentic security risks into four layers (Core, Connection, Cognition, Compliance) to address trust and governance issues beyond prompt injection.
Proposes a co-evolving RL game between an attacker and defender agent to stress-test safety alignment against novel attack patterns.
Introduces an LLM agentic system that reconstructs blockchain exploit lifecycles from limited evidence and generates runnable proof-of-concept reproductions.
Argues that AI-agent-driven cyber attacks are inevitable and proposes building frontier offensive AI capabilities responsibly as essential defensive infrastructure.
Proposes protocol-level security improvements for the Model Context Protocol including unified identity management, mutual authentication, and fine-grained policy enforcement.
Investigates how user persuasion during conversation can carry over and change how autonomous AI agents perform later tasks.
Explores how collective false memories form in LLM-based multi-agent systems and proposes defenses including cognitive anchoring and alignment-based approaches.
Proposes a black-box attack method that generates transferable adversarial tokens to manipulate LLM-based retrieval systems without needing access to the target's queries or model.
Introduces CacheAttack, a black-box framework that exploits the trade-off between locality and collision resistance in semantic caching to hijack LLM responses and manipulate agentβ¦
Proposes a verify-then-pay infrastructure for agent transactions that locks funds in escrow, requires cryptographic proof of task execution, and releases payment only after verificβ¦
Red-teams Google's Agent Payments Protocol via prompt injection attacks that manipulate product ranking and extract sensitive user data in agent-led purchase flows.
Introduces a benchmark for evaluating when agent violations are detected during execution rather than just whether, with temporal metrics for early intervention and tokens saved.
Argues that static compliance-based governance is insufficient for agentic AI at machine speed and proposes runtime governance to preserve human relevance in agent-driven decision-β¦
Introduces an adversarial attack that poisons retrieval contexts in RAG-based code generation to force longer outputs, increasing GPU latency and energy consumption.
Surveys security threats targeting AI agents in cyber-physical systems, covering deepfake attacks, MCP-mediated vulnerabilities, and defense-in-depth architectures.
Explores AutoGen-based multi-agent coordination with specialized agents for static, dynamic, and network-level ransomware family classification using confidence-aware decisions.
Introduces a multi-agent auto-healing defense framework with semantic similarity retrieval, pattern matching, and an evolving knowledgebase for defending LLMs against resource exhaβ¦
Explores agentic AI for pre-commit secure code review that uses autonomous decision-making, tool invocation, and security-focused semantic memories to detect immature vulnerabilitiβ¦
Introduces a three-dimensional taxonomy for agentic risks and a diagnostic guardrail framework that monitors agent trajectories with fine-grained root cause analysis beyond binary β¦
Examines how benign personal memories in personalized agents can bias intent inference and cause models to legitimize harmful queries through a previously unexplored safety vector.
Proposes a multi-agent collaborative framework with specialized LLM-enhanced agents for intelligent data processing and adaptive intrusion classification in aerial IoT networks.
Introduces a protocol-agnostic execution control plane for autonomous agents that enforces authorization boundaries with canonical action representation and deterministic policy evβ¦
Examines privacy risks in multimodal RAG pipelines through inclusion inference and metadata leakage attacks during standard model prompting.
Presents the first security analysis of the Model Context Protocol specification, identifying three protocol-level vulnerabilities and proposing backward-compatible security extensβ¦
Surveys 78 studies to systematize prompt injection attacks on agentic coding assistants with a three-dimensional taxonomy across delivery vectors, modalities, and propagation.
Introduces RAGCrawler, a knowledge graph-guided attack that adaptively steals RAG corpus content through targeted queries to maximize coverage under a query budget.
Presents a multi-tenant chatbot deployment platform with container-based isolation and platform-level defenses against prompt injection attacks in RAG-based systems.
Interoperable Architecture for Digital Identity Delegation for AI Agents with Blockchain Integration
Introduces delegation grants and a canonical verification context for bounded, auditable identity delegation across human users and AI agents in heterogeneous identity ecosystems.
Proposes an infection-aware defense framework for multi-agent systems that distinguishes infected agents from attackers and applies topological constraints to halt malicious propagβ¦
Proposes AGEA, an agentic framework using novelty-guided exploration and graph memory to steal latent entity-relation graphs from GraphRAG systems under strict query budgets.
Introduces activation-space guardrails that detect privacy-violating intent in LLM agents through linear separation of internal representations, including drift detection across muβ¦
Proposes a three-agent sandbox simulation framework with 40 crime tasks across 13 objectives to evaluate the criminal capabilities of LLM agents in realistic scenarios.
Introduces an adaptive prompt injection framework targeting navigation agents under black-box, long-context, and action-executable constraints across indoor and outdoor environmentβ¦
Explores a multi-agent defense pipeline combining semantic similarity caching, nested learning, and observability-aware evaluation to mitigate prompt injection attacks while reduciβ¦
Introduces an overthinking attack framework for RAG systems with reasoning models, using multi-agent-constructed poisoning samples that cause excessive reasoning token consumption β¦
Introduces a framework for detecting and mitigating tool-driven agency risks through offline interface verification and runtime per-step least-privilege tool access with adaptive fβ¦
Proposes a privacy-preserving RAG framework using conditional approximate distance-comparison-preserving encryption that enables similarity computation on encrypted embeddings in uβ¦
Taming Various Privilege Escalation in LLM-Based Agent Systems: A Mandatory Access Control Framework
Proposes a mandatory access control framework for LLM agent systems that monitors agent-tool interactions via information flow graphs and enforces attribute-based policies against β¦
Introduces governance graphs as public, immutable manifests with enforceable sanctions and restorative paths to govern multi-agent LLM coordination and prevent harmful collusion.
Proposes a prompt-injection-resilient RAG framework that decouples security enforcement from generation by applying sanitization and policy-aware disclosure controls during the retβ¦
Introduces a stealthy multi-turn economic DoS attack exploiting the agent-tool communication loop through MCP-compatible tool server modifications that inflate costs by up to 658x.
Introduces a benchmark and harness for evaluating web-facing RAG systems under indirect prompt injection and retrieval poisoning attacks with standardized end-to-end evaluation froβ¦
Introduces a neuro-symbolic containment architecture that decouples normative reasoning from instrumental decision-making through a Moral Module, Decision-Making Module, and compliβ¦
Presents a security framework that learns context-aware access-control policies from monitored execution traces to govern AI agent operations and detect malicious inputs while presβ¦
Analyzes 42,447 agent skills from two major marketplaces to study the prevalence and types of security vulnerabilities spanning prompt injection, data exfiltration, privilege escalβ¦
Proposes single-shot planning for Computer Use Agents that provides provable control flow integrity against prompt injection while preserving agent capability.
Tests open-source function-calling LLMs against multiple attack types with various defenses to study the readiness of current models and mitigations for production deployment.
Examines how commercial planning and web-use agents handle user-mediated attacks where the user themselves provides adversarial instructions without explicit safety requests.
Formalizes how propositions gain unwarranted trust by crossing architecturally trusted interfaces in agent systems, studying whether circular epistemic justification is inevitable β¦
Proposes applying System-Theoretic Process Analysis to identify hazards in agent tool-use workflows, deriving formal safety specifications enforced through a capability-enhanced Moβ¦
Introduces an automated framework for implicit tool poisoning in MCP where a poisoned tool remains uninvoked but its metadata manipulates the agent into performing malicious operatβ¦
Proposes a black-box attack that decomposes indirect prompt injection into trigger and attack fragments to study end-to-end IPI exploits under natural queries across RAG and agentiβ¦
Proposes a hardware-backed zero-trust architecture for AI memory systems that applies TEE protection across five functional layers with a cross-application sharing protocol for ageβ¦
Introduces a benchmark for evaluating safety alignment of AI agents performing professional-level tasks across diverse domains, uncovering new unsafe behaviors in complex professioβ¦
Demonstrates that off-the-shelf LLM agents with web search can re-identify participants in anonymized qualitative datasets using only natural-language prompts, lowering the technicβ¦
Proposes a conceptual and operational framework for safe AI agent development grounded in transparency, accountability, and trustworthiness, with progressive validation analogous tβ¦
Proposes a verify-before-commit protocol for defending LLM agents against tool stream injection, using speculative hypothesis generation and intent-grounded verification to balanceβ¦
Evaluates memory poisoning attacks on memory-augmented LLM agents and proposes two defense mechanisms: input/output moderation with composite trust scoring and memory sanitization β¦
Proposes a secure transpiler and executor for LLM-generated code that detects vulnerabilities and safely executes code snippets in autonomous production AI systems without relying β¦
Investigates conformity bias in AI agents under social pressure using adapted visual experiments from social psychology, studying sensitivity to group size, unanimity, task difficuβ¦
Proposes a tool result parsing method for defending LLM agents against indirect prompt injection by providing precise data while filtering out injected malicious code.
Surveys agent-blockchain interoperability patterns and threat models for agent-driven transaction pipelines, covering custody models, policy enforcement, and multi-agent workflows.
Proposes a stage-aware framework for analyzing backdoor attacks across planning, memory, and tool-use stages of LLM agent workflows with cross-stage trigger propagation.
Proposes a deceptive defense framework using collaborative defender agents to counter multi-turn jailbreak attacks by strategically wasting attacker resources.
Systematizes privacy risks, mitigation techniques, and evaluation strategies in RAG systems through a comprehensive literature review with a taxonomy and process diagram.
Proposes a behavioral watermarking framework that embeds multi-bit identifiers into agent planning decisions for IP protection and regulatory provenance while preserving utility.
Proposes structural tokenization that encodes execution-flow patterns instead of conversational content to improve cross-attack generalization in AI agent threat detection.
Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage
Introduces a cognitive collusion attack where colluding agents steer victim beliefs using only truthful evidence fragments distributed through public channels without covert communβ¦
Proposes a lightweight framework that safely executes untrusted MCP tools inside a WebAssembly sandbox and produces auditable reports of external-to-sink exposures.
Analyzes toxicity adoption dynamics among LLM-driven agents on a fully AI-driven social platform, studying how cumulative toxic exposure affects the probability of toxic responses.
Proposes a Siamese Recurrent Autoencoder with hybrid contrastive-reconstruction loss for real-time anomaly detection in agent action trajectories.
Maps human anti-collusion mechanisms including sanctions, leniency, monitoring, and market design to potential interventions for multi-agent AI systems.
Proposes a data adulteration framework that pre-emptively injects plausible but false entries into knowledge graphs to make stolen GraphRAG KGs unusable to adversaries.
Examines intergroup bias in LLM agents under minimal group cues and formalizes a Belief Poisoning Attack that manipulates agent identity beliefs to induce outgroup bias toward humaβ¦
AI agents -- systems that plan, reason, and act using large language models -- produce non-deterministic, path-dependent behavior that cannot be fully governed at design time, wherβ¦
Security in LLM agents is inherently contextual. For example, the same action taken by an agent may represent legitimate behavior or a security violation depending on whose instrucβ¦