2026年5月23日 (周六)
代理安全正在从理论转向具体的攻击和防御模式:域-camouflaged 迅速注射可以绕过天真的滤波器,隐蔽通道甚至可以通过‘benign'输出来过滤数据,新的基准试图测量代理行为跨越混乱的多目标环境. 如果你部署特工, 假设对抗性输入和仪器 以遏制, 不只是准确性。
代理安全正在从理论转向具体的攻击和防御模式:域-camouflaged 迅速注射可以绕过天真的滤波器,隐蔽通道甚至可以通过‘benign'输出来过滤数据,新的基准试图测量代理行为跨越混乱的多目标环境. 如果你部署特工, 假设对抗性输入和仪器 以遏制, 不只是准确性。
多剂系统的实际绕道
一份新论文分析“域-camouflaged 注射”攻击,
在真实的部署中,特工会消耗网页,门票,文件,以及混合可信和不信任文本的电子邮件. 如果攻击者可以作出“在域内”的指令, 简单的允许列表、关键词过滤器或源码检查会失败,
- 01 Treat all retrieved text as untrusted input, even when it comes from ‘familiar’ domains or looks semantically on-topic.
- 02 Multi-agent architectures can amplify risk, because one compromised sub-agent can pass poisoned instructions to others as ‘internal’ messages.
- 03 Detection should be coupled with containment: when a prompt-injection slips through, the blast radius should still be small.
Add a hard boundary between ‘retrieved content’ and ‘instructions’: enforce a policy that only system prompts (or signed internal directives) can create new goals, request secrets, or change permissions. Use least-privilege tool grants per step (read-only by default), and log the exact text span that triggered each tool call so you can trace which document steered the agent.
随着特工们走上更多 " 侵略 " 道路,秘密通道的防御越来越重要
一份论文建议为LLM剂Egress建立一个应用层参考显示器,侧重于隐蔽通道,可以将数据隐藏在原本允许的有效载荷中(格式化、订购、定时、编码或介质文物).
如果一个失密的代理人能将秘密编码成允许的产出,屏蔽目的地和扫描文本是不够的。 随着代理商获得更多的输出模式(JSON,代码,图像,多段消息)和更多的自动化钩子(ticket,聊天,报告),可能隐蔽的频道数量不断增长.
- 01 ‘Allowed output’ does not mean ‘safe output’, because data can be encoded in structure, not just words.
- 02 Egress controls need to be protocol-aware (schemas, canonicalization, length limits), not just content-aware.
- 03 If your incident model includes secret leakage, you must monitor and constrain outputs at the boundary, not only at inputs.
Canonicalize outbound artifacts: stable JSON key ordering, normalized whitespace, strict schemas, bounded field lengths, and rejection of invisible characters or homoglyphs. Where possible, separate high-trust outputs (e.g., internal logs) from low-trust channels (external messages), and require human review for any step that could leak sensitive context.
基准正在从 " 单一目标 " 扩大到不确定的代理战略
新工作提出了在更现实的环境中评价代理行为的基准,包括多目标网络的CTF和超越单一结果领导板的更广泛的代理评价框架.
只有结果的分数可以隐藏危险或刚柔的行为(不安全的工具使用、猜疑和检查抽打以及分数差)。 多目标环境迫使代理商优先排序,分配时间,管理不确定性,这更接近实际操作者式代理商的行为.
- 01 A high success rate is less meaningful if the agent got there via risky, non-repeatable, or unsafe steps.
- 02 Evaluation should capture process signals: tool-call budgets, retries, privilege usage, and how often the agent asks for escalation.
- 03 If you deploy offensive or admin-like agents, benchmark them in environments that include ‘unknown unknowns’, not just scripted exploits.
Adopt a two-layer eval: (1) outcome metrics (task completion, time), plus (2) safety/process metrics (max privilege used, forbidden action attempts, network egress attempts, and number of tool calls). Treat regressions in layer (2) as release blockers even if layer (1) improves.
CTFExplorer: Evaluating LLM Offensive Agents Through Multi-Target Web CTF Benchmarking
Benchmark for evaluating offensive agents across multiple unknown targets, emphasizing triage and strategy.
AgentAtlas: Beyond Outcome Leaderboards for LLM Agents
Paper arguing for richer, multi-dimensional evaluation of agent systems beyond single-score leaderboards.
作为“代理时代的IDE”的超级集发射
Superset(YC P26)是围绕着代理工作流程构建的IDE,反映了向工具链的持续转变,使代理运行可复制、可检查和团队共享。
向船舶提供11Labs动力的音频书制作工具
Spotify正在推出由11Labs提供动力的AI音频书创作工作流程,这个信号表明,创建工具和分发管道正在成为AI的主要战场.