2026年3月11日 (周三)
OpenAI和Google推动更具互动性,工作流程-内在的AI体验,而研究人员和构建者则专注于代理可靠性(指令层级,代码审查)和代理基础设施(地铁代理,上下文检索).
OpenAI和Google推动更具互动性,工作流程-内在的AI体验,而研究人员和构建者则专注于代理可靠性(指令层级,代码审查)和代理基础设施(地铁代理,上下文检索).
OpenAI 发布指令等级制挑战,以硬化模型,防止迅速注射
OpenAI发布了指令等级挑战(IH-Challenge),旨在培训和评估前沿模式是否正确优先处理可信指令而不是未信任或冲突指令.
随着模型成为工具使用剂,指令跟踪失败变成了真正的安全事件(即时注射,数据过滤,未经授权的行动). 更好的指导层级提高了可引导性,降低了企业部署中的操作风险.
- 01 Instruction hierarchy is shifting from a research topic to a practical security control for agentic systems.
- 02 Teams deploying tool-using LLMs should treat prompt injection like a first-class threat model and test for it continuously.
- 03 Even without new model training, product mitigations (trusted tool routing, allowlists, policy gates) remain essential because evaluation gains do not eliminate adversarial inputs.
If you ship an agent that browses or runs tools, add a regression suite of adversarial prompts (hidden instructions, conflicting system/user content, malicious webpages) and require explicit tool authorization for high-impact actions. Track failures as security bugs, not UX issues.
ChatGPT 为数学和科学解释添加交互式视觉
ChatGPT现在可以生成交互式的视觉解释,这样学习者就可以操纵变量,探索概念而不是依赖静态图.
交互式表述可以减少认知负荷,并产生早期可见的概念错误. 对于AI产品来说,这也标志着从只使用文本的答案向嵌入式,可扩展的UI输出的转变,增加了参与和学习成果.
- 01 Expect more AI outputs to become interactive artifacts (widgets, simulations, manipulatives) rather than paragraphs of text.
- 02 For education and documentation, interactivity can improve comprehension but also increases the need for correctness and guardrails.
- 03 Product teams should plan for evaluation beyond text: UI behavior, numerical fidelity, and edge-case handling matter.
If you build learning or analytics features, prototype a small set of interactive components (sliders, plots, step-by-step state) and set up validation tests for numerical accuracy and boundary conditions. Add clear citations or assumptions for generated visuals.
New ways to learn math and science in ChatGPT
ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.
ChatGPT can now create interactive visuals to help you understand math and science concepts
Users can engage directly with interactive visuals instead of only reading explanations or viewing static diagrams.
Google Sheets中的双子座增加了β特性,并声称是最新性能
Google在β中宣布了新的双子座Sheets能力,以帮助用户创建,组织和编辑电子表格,并通过自然语言请求进行更复杂的数据分析.
电子表格是商业用户的高杠杆表面积. 提高AI-in-Sheets的质量可以通过将AI嵌入已经发生工作的地方来加速采用,它提高了企业分析中的准确性、透明度和可审计性。
- 01 Workflow-native AI (inside Sheets) is competing with standalone chat tools for daily business usage.
- 02 The biggest risk is silent analytical error; spreadsheet AI needs stronger provenance, explainability, and reproducibility.
- 03 Beta rollouts suggest rapid iteration—teams should watch for admin controls, data-handling policies, and compliance posture.
If you rely on AI-assisted spreadsheet analysis, require a repeatable trail: keep raw data snapshots, save generated formulas/queries, and add peer review for any decision-making dashboards. For vendors, expose a 'show work' mode and deterministic re-run options.
Gemini in Google Sheets just achieved state-of-the-art performance
Google announces new beta features for Gemini in Sheets to help create, organize, and analyze spreadsheets via natural language.
Google rolls out new Gemini capabilities to Docs, Sheets, Slides, and Drive
New features aim to make Workspace apps more personal and capable to help users get things done faster.
NVIDIA 引入 Nemotron-Terminal, 终端代理的数据工程管道
一份涵盖NVIDIA的Nemotron-Terminal努力的书写工作侧重于为LLM终端代理商系统生成和处理培训数据,解决代理商能力规模上的一个主要瓶颈。
Amazon在其应用程序和网站上推出一个保健AI助手
Amazon推出一名健康助理, 能够回答问题、解释记录、管理处方续订,
Tilde Open LLM:为34种欧洲语言培训开放的30B模式
一份arXiv文件介绍了一个30B开放量模型,其重点是利用抽样和基于课程的培训,以缩小低资源语言的绩效差距,实现公平欧洲语言覆盖。