2026年3月13日 (周五)
一个代理建设的启动公司在继续"每个雇员的AI"消息的同时,也提出了一大轮,而研究和开源工作则倾向于本地第一,在线的个人代理. 大型消费者平台还通过任务自动化和丰富的产出,不断将助理推向更深的工作流程。
一个代理建设的启动公司在继续"每个雇员的AI"消息的同时,也提出了一大轮,而研究和开源工作则倾向于本地第一,在线的个人代理. 大型消费者平台还通过任务自动化和丰富的产出,不断将助理推向更深的工作流程。
Gumloop 筹集5万美元,使非工程师能够进入代理大楼
TechCrunch报告Gumloop筹集了5 000万美元,由Bridge牵头,将其产品定位为日常员工为工作任务建造AI代理的直观方式.
如果代理创建成为无码或低码能力,则通过从集中的AI团队转移到单个功能(销售业务,财务,支持). 这可以加快实验,但也能使治理面面积成倍增加:数据访问、即时/工具权限以及可审计性需要与构建者的数量相适应。
- 01 The next wave of 'agent adoption' is likely a distribution problem (who can build) as much as a model-quality problem.
- 02 Empowering non-engineers increases the risk of shadow automation touching sensitive systems unless permissions and logging are designed-in.
- 03 Agent ROI will be judged on throughput and reliability: how often automations complete end-to-end without human cleanup.
Before rolling out an agent builder broadly, define a permission model (what tools and datasets each role can access), require per-agent owners, and mandate run logs for any workflow that touches customer data, financial systems, or production infrastructure.
Track a simple KPI: successful runs / total runs for the top 10 automations, plus time saved net of exception handling.
斯坦福研究者为本地第一, 在线个人代理发布 OpenJarvis
MarkTechPost强调OpenJarvis,一个来自斯坦福的开源框架,旨在用工具,内存和学习支持个人AI代理运行在设备上.
本地第一代理改变隐私和可用性权衡:无需向第三方API发送数据就可以完成更多任务,代理仍然可以保持线下有用. 更难的部分是软件堆栈:工具执行,内存管理,和安全的学习循环需要在移动/前沿限制内工作.
- 01 On-device agent stacks are maturing from 'run a model locally' into full systems (tools + memory + learning).
- 02 Privacy gains are real, but reliability and device-resource constraints (latency, battery, storage) become first-class product requirements.
- 03 Local agents still need strong safety boundaries because tools can have real-world side effects even without cloud connectivity.
If you are prototyping on-device agents, start with a narrow toolset and strict allowlists. Measure energy cost per task and set timeouts for long-running tool calls.
Design memory with retention rules: what is stored, for how long, and how users can inspect and delete it.
助理深入工作流程:任务自动化和丰富的视觉产出
Verge报告Google正在推出双子座任务自动化,用于订购食物或预订骑行等新设备,并注意到Anthropic更新了Claude,以便在有用时生成内线图和图表.
助理战场正在从聊天质量转向工作流程完成:模型可以安全操作应用程序,并以格式提交决定,人们可以快速验证. 视觉文物(图,图)可以减少误解和速度审查,但也增加了新的故障模式(误导视觉,不正确的尺度,省略的提醒).
- 01 Automation features will be evaluated on trust and reversibility: users need clear previews, confirmations, and undo paths.
- 02 Inline visuals can improve comprehension, but teams must test for 'confidently wrong' charts that look plausible.
- 03 As assistants gain app control, access control and scoped permissions become as important as model alignment.
If you deploy assistant-driven automations, require a review step for high-impact actions (purchases, messages, calendar changes). Log every tool action and show a user-visible activity trail.
If your product renders AI-generated charts, validate axes/units and annotate uncertainty (data source, assumptions) to prevent polished misinformation.
Gemini’s task automation is here and it’s wild
Coverage of Gemini task automation using apps on a user's behalf.
Anthropic's Claude AI can respond with charts, diagrams, and other visuals now
Update enabling Claude to generate inline visualizations such as charts and diagrams.
IonRouter投出高通量、低成本推论路线
发射的HN线指向IonRouter,将其设定为最优化于吞吐量和成本的推论路由产品.
AMD Instinct LLM 推论基准突出特定架构调整
arXiv的论文将AMD Instinct GPUs的大模型推论作为基准,认为架构感知设置(如注意力变体,KV缓存行为)驱动实际部署结果.
新的越狱角度通过同时进行的任务干预,瞄准思维模式 LLMS
arXiv预印提出多流扰动攻击,试图在思维链式处理过程中通过互离任务来打破安全调和.