2026年5月17日 (周日)
代理系统正在从示范转向生产,困难的问题是孤立、持久和治理。 实际的外卖是像不信任的代码那样对待代理人:默认的沙盒,记录一切,以及基准不仅任务成功,而且战略和社会失败模式.
代理系统正在从示范转向生产,困难的问题是孤立、持久和治理。 实际的外卖是像不信任的代码那样对待代理人:默认的沙盒,记录一切,以及基准不仅任务成功,而且战略和社会失败模式.
LiteLLM 打开源代码为孤立沙盒和持久性会话的代理平台
MarkTechPost强调LiteLM代理平台,定位为基于Kubernetes的自设基础设施层,以运行具有孤立环境和持续会话管理的代理,跨越重启和团队.
生产代理商在模型质量方面的失败较少,在操作现实方面的失败更多:依赖漂移、状态损失、跨租户数据泄漏、以及失控的工具许可。 一个将沙箱和会话持久性标准化的平台可以减少混乱,但是如果隔离边界薄弱,它也会集中风险.
- 01 Isolation is the product: per-task or per-tenant sandboxes reduce the blast radius of prompt injection, malicious inputs, and dependency-level supply chain issues.
- 02 Persistent sessions improve usability, but they also create a long-lived privacy and compliance surface. Retention policies and audit trails become mandatory.
- 03 A shared orchestration layer can become a single point of failure. Treat it like critical infrastructure with least-privilege defaults and clear escape hatches.
If you are shipping agents inside an org, start with an “agent runtime checklist”: sandboxing model (container/VM), egress controls, per-tool scoped credentials, immutable logs, session retention limits, and a kill switch. Make these defaults before you add more tools or autonomy.
ChatGPT 扩展为个人财务,并有连接的账户(高端工作流程转移)
TechCrunch报告说,OpenAI在ChatGPT中推出了个人财务经验,可以连接银行账户,并显示支出、订阅、即将支付的支付和组合业绩的仪表板。
连接账户将助理从“咨询”系统转移到“行动相邻”系统。 颠峰是个性化和工作流程压缩. 缺点是更大的安全性和正确性表面,其中错误会造成真正的经济损害.
- 01 Once accounts are connected, the dominant risk is not a wrong answer, it is misleading certainty grounded in real balances and transactions.
- 02 Trust increases when the assistant “knows your numbers,” so provenance and error recovery (what changed, why, and how to undo) matter more.
- 03 Integrations multiply the attack surface. Permissions, data brokers, and export paths need strict scoping and monitoring.
If you build finance-adjacent AI features, default to read-only, show the underlying transaction evidence for every insight, and require explicit confirmation for anything that resembles an instruction to move money, cancel services, or change allocations.
新的基准探索谈判、虚张声势以及多种代理系统中的对抗性强
近期的ArXiv文件引入了涉及谈判和虚张声势(Cattle Trade)的多代理评价,针对欺骗性代理的对抗性强势(GAMBIT),以及在社会压力下相互矛盾的特殊辅导风险。
实际部署越来越类似于多种行为者的环境:用户、工具、政策,有时还有其他行为者。 战略行为和社会操纵可以打破在单剂,单转测试中看起来安全的系统.
- 01 Multi-agent dynamics can amplify weaknesses, including persuasion, collusion, and “authority pressure” that pushes the system toward agreeable but incorrect behavior.
- 02 Robustness should be measured against adaptive adversaries that change tactics after defenses are observed, not just fixed prompts.
- 03 Benchmarks that include long-horizon interactions are closer to production, where failures often emerge from state, incentives, and accumulated small errors.
If you deploy agent collectives (planner plus workers, or tool-using agents), add “red-team agents” to your evaluation: negotiation, deception, and social pressure. Require independent verification steps for high-stakes claims and log full traces for postmortems.
Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining
Multi-agent benchmark covering auctions, bargaining, bluffing, and long-horizon gameplay.
GAMBIT: A Three-Mode Benchmark for Adversarial Robustness in Multi-Agent LLM Collectives
Benchmark for adversarial robustness in multi-agent collectives with multiple evaluation modes.
Sycophancy is an Educational Safety Risk: Why LLM Tutors Need Sycophancy Benchmarks
Position paper arguing that tutoring agents need sycophancy benchmarks to avoid harmful agreeableness.
隐形管弦乐手可能改变多代理组织的安全行为
一篇论文研究了多剂设置中隐藏的协调员如何抑制保护行为和转移故障模式,认为管弦结构本身就是安全变量.
SWE-Chain的目标是对编码剂进行现实的“链式”依赖升级
对连续发行级套件升级的基准代理,比孤立售票更接近实际维护工作.
利用Bench框架作为安保人员的能力梯子
一个将开发分级为渐进能力(从触发bug到建立原始人和控制)而不是单一二进制结果的基准.