2026年3月12日 (周四)
模型和代理基础设施的更新,加上显著的市场跨越股票和加密。
NVIDIA推开开放的模型和代理训练基础设施的叙事(Nemotron 3 Super和终端代理数据管道),而产品聊天则专注于将基因视频(Sora)带入ChatGPT等工作流程表面. 正在继续研究探究物剂的可靠性、评价和以监管为导向的基准。
NVIDIA touts Nemotron 3 Super:一个120B的开放式混合式MOE模型,以代理工作量为目标
覆盖报告 NVIDIA发布了Nemotron 3 Super,描述为120B参数的开源混合Mamba-attention MoE模型定位,用于更高的吞吐量和多剂/工具使用情景.
为吞吐量而优化的开放性,高容量模型可以改变代理系统的经济效益(每次动作的延迟度较低,成本较低),特别是用于多代理管弦乐,其中推论量迅速攀升. 如果业绩要求维持下来,它加强了企业和研究部署的“开放式权重正在赶上”叙述。
- 01 Throughput-focused architecture choices (hybrid + MoE) matter as much as raw quality once agents become always-on services.
- 02 Open-weight, large models can shift build-versus-buy decisions for teams that need customization, on-prem options, or tighter data control.
- 03 For production agents, model choice is increasingly a systems decision: batching, tool-call patterns, and context length drive real cost more than benchmark scores.
If you are evaluating open models for agents, run a workload-specific bake-off: measure tool-call latency, token throughput, and failure modes (hallucinated commands, unsafe actions) on your real tasks. Track $/successful task, not just $/1M tokens.
NVIDIA强调Nemotron-Terminal是用于缩放终端代理的数据管道
写作描述Nemotron-Terminal,被设定为一个系统的数据工程管道,旨在生成和整理基于终端的LLM代理的培训数据.
终端代理只和教他们现实的指令序列,错误恢复和安全操作行为的数据一样好. 使数据管道明确(可重复)可以加速提高剂剂能力,同时改进可复制性和安全测试。
- 01 Agent progress is increasingly gated by data quality and coverage, not just model size.
- 02 Terminal environments are high-risk: data must encode safe defaults, permission boundaries, and robust failure handling.
- 03 Transparent pipelines make it easier to audit what an agent was trained to do, which matters for enterprise adoption and compliance.
If you train or fine-tune terminal agents, create a task taxonomy (setup, build, deploy, incident response) and ensure you have examples that include failures (missing dependencies, permission errors, conflicting configs). Add automatic checks that block destructive commands unless explicitly authorized in the eval harness.
报告: OpenAI 的 Sara 可以直接整合到 ChatGPT
Verge报告Sora是OpenAI的视频生成产品,预计可以在ChatGPT内部访问,而不只是通过单独的网站/应用.
将视频生成移入主导聊天表面会改变产品分布和使用模式:它会降低摩擦,增加迭代的提示,并能够在一个上下文中实现多式联运工作流程(文本到故事板到视频). 它还在合成媒介方面引起新的安全和政策关切。
- 01 Multimodal creation is shifting from 'specialty tools' to default chat workflows, which can dramatically increase adoption.
- 02 Video generation inside a general assistant will pressure teams to improve provenance, watermarking, and abuse detection for synthetic media.
- 03 For creators and marketers, the competitive edge will increasingly come from workflow design (templates, brand controls, review loops) rather than raw model access.
If you plan to use AI video in production, define a review pipeline now: human approval for public releases, a policy for likeness and copyrighted content, and a storage strategy that keeps prompts, versions, and source assets for auditability.
Google 引入双子座嵌入 2 用于多模式检索
Google公司宣布双子座嵌入2,这是一种多模式嵌入模型,旨在将文本,图像,音频,视频,文档放入共享嵌入空间进行检索和RAG风格的应用.
GateLens建议为汽车软件发布分析提供推理强化代理
一份ARXIV文件介绍了在安全和遵约相关背景下分析大型表格数据集的LLM代理方法,侧重于模糊性分辨率和结构化推理。
AI 法案评价基准目标:可复制评价NLP和RAG遵守情况
arXiv数据集提案,旨在通过监管合规透镜对NLP和RAG系统进行透明、可复制的评价。