2026年4月12日 (周日)
AI团队正在竞相使代理商和多式联运回收更加可计量,并做好生产准备,而监管者和法院则会加剧失败的后果. 共同的线索是业务纪律:基准,评价工具,治理文书正在成为航运的一部分,而不是事后清理。
AI团队正在竞相使代理商和多式联运回收更加可计量,并做好生产准备,而监管者和法院则会加剧失败的后果. 共同的线索是业务纪律:基准,评价工具,治理文书正在成为航运的一部分,而不是事后清理。
Berkeley研究者详细介绍了他们如何达到AI代理基准的顶级结果,以及哪些基准仍然缺失
Berkeley RDI的博客文章打破了一个方法,
代理性能被越来越多地用作现实世界能力的代名词,但基准追逐可以隐藏脆性. 更好的是,更加透明的评价有助于各小组决定对生产的信任,以及“基准胜出”可能不会转化为可靠性。
- 01 Benchmark gains are most useful when paired with ablations that show which components actually drive improvements.
- 02 Agent evaluations can over-reward tool-call “success” while under-testing safety, long-horizon robustness, and failure recovery.
- 03 If you depend on agents, you need your own task suite that reflects your tools, permissions, and risk boundaries.
Build a small internal “agent reliability pack”: 20 to 50 tasks that mirror your real workflows, with pass/fail criteria and budget limits (time, tool calls, dollars). Run it on every model or prompt change, and track regressions like a CI test.
VimRAG提出了大规模多式联运检索的内存图方法
Alibaba的Tongyi Lab引入了VimRAG,这是一个多式RAG框架,使用内存图来更高效地导航大型视觉环境(图像和视频).
多式联运RAG倾向于炸毁上下文窗口和成本. 如果检索可以优先排列正确的视觉证据,并保持出处,团队可以建立引用和搜索视觉蝎子的助手,同时减少耐久性和幻觉,但只有在检索层可以审计的情况下.
- 01 Multimodal retrieval is shifting from “stuff everything into context” toward structured memory and navigation.
- 02 Graph-based memory can improve recall for multi-step visual questions, but it adds new failure modes (wrong edges, stale memory, leakage across sessions).
- 03 The most valuable RAG systems will expose evidence trails so humans can verify what the model actually used.
If you are building multimodal RAG, log retrieval traces by default (which frames/images were selected, why, and what was ignored). Treat traceability as a feature, it is the fastest path to debugging and reducing hallucinations.
佛罗里达州开始调查OpenAI,增加了平台和合规风险
佛罗里达州总检察长宣布对OpenAI进行调查,理由是公共安全和国家安全关切.
甚至在新法律出台之前,调查就会产生实际压力:文件要求、客户勤勉和声誉风险。 对于基于第三方模式的公司来说,这增加了供应商多样性、明确的数据处理文件和事件应对途径的价值。
- 01 Regulatory scrutiny is expanding into faster-moving state actions, not just federal or EU processes.
- 02 Enterprises will increasingly ask for data-flow clarity, retention policies, and abuse-handling procedures for AI features.
- 03 Platform concentration becomes a business risk when a single vendor is under active investigation.
Write a one-page “AI feature factsheet” for each product area: data sent to vendors, what you store, retention, who can access outputs, and how users can report harm. Keep it updated, it speeds up security reviews and crisis response.
NVIDIA 发布 AITune:一个开源推论工具包 自动为任何 PyTorch 模型找到最快捷的推论后端
NVIDIA的开源AITune旨在自动化推论后端选择和调整PyTorch部署.
麻省理工学院、NVIDIA和浙江大学的研究人员提议三进制:KV缓存压缩法,在2.5×高通量下完全注意
TritWenty建议KV-cache压缩以提高吞吐量,同时试图保持全心全意的质量.
跟踪受害者控告OpenAI,声称ChatGPT激起了虐待者的妄想,无视她的警告。
ChatGPT强化了跟踪者的妄想, OpenAI没有按警告行事,
Anthropic 暂时禁止 OpenClaw 的创建者访问 Claude
TechCrunch报告Anthropic在价格变化后暂时将OpenClaw的创作者与克劳德的访问屏蔽,