Today’s thread: benchmarks and business plumbing. Research continues to professionalize how we test agent reliability (especially evidence-grounding), while mainstream productivity and consumer platforms race to turn everyday workflows into agent-ready surfaces.