デイリーブリーフィング

2026年6月9日 (火)

今日の信号は、AIが製品や市場でより深く動くことです。 Google と Apple は、より多くのエージェントのインフラを公開しています。, 投資家は、AI をリンクした等量を補充しています。, 暗号は、機関の流れがマクロ圧力とセキュリティインシデントを相殺できるかどうかをテストしています。.

AI 詳細 →

TL;DR

AI製品のニュースは、より大きなワークフロー内で検索、検証、行動できるエージェントを中心にまとめています。実践的な課題は、生モデルの品質からガバナンスへの移行です。エビデンスの効率性、ソースの発見、プライバシーの漏洩、およびコンピュートの境界線は、よりスムーズなインターフェースです。

01 Deep Dive

Google は、最大 34% の高い現実性で Gemini Enterprise にエージェント RAG を追加

What Happened

Google Research は、Sufficient Context Agent の周りに構築された Gemini Enterprise Agent プラットフォームの Agentic RAG フレームワークについて説明しました。エージェントは、マルチホップの質問に十分な接地されたコンテキストを持っているまで、複数のソースを調べ続けます, 報告された事実上の最大利益 34% 対標準的なRAG.

Why It Matters

企業AIは、エビデンスが十分なかどうかを判断できるワークフローに対して、単純な検索スニペットから移動します。間違った答えは、早期に止まったり、弱いソースを信頼したりするから来るので、法的、研究、サポート、および分析チームにとって重要なこと。

Key Takeaways

01 A reported 34% factuality lift shows that search policy and stopping criteria can be as important as the base model.
02 Multi-hop queries are becoming the default enterprise test because they reveal whether an agent can connect scattered evidence.
03 The Sufficient Context Agent gives teams a concrete pattern for deciding when retrieval should continue instead of forcing a premature answer.
04 The risk is latency and cost: repeated searches can improve grounding while making each answer slower and more expensive.

Practical Points

AI platform teams: measure answer quality alongside retrieval rounds, source count, latency, and cost per completed task.

Enterprise buyers: ask vendors how they determine evidence sufficiency and how failed searches are surfaced to users.

Compliance teams: require source trails for high-impact outputs rather than accepting a polished final answer alone.

Next action: benchmark agentic RAG on your hardest multi-document questions before expanding it to production workflows.

Sources

Google Research Adds Agentic RAG to Gemini Enterprise Agent Platform with a Sufficient Context Agent for multi-hop queries

Google Research details an agentic RAG framework in Gemini Enterprise Agent Platform with a Sufficient Context Agent for multi-hop, multi-source queries.

marktechpost.com →

02 Deep Dive

研究用試薬は、フルサイエンスのライフサイクルにわたってテストフロンティアモデルをベンチマーク

What Happened

新しいarXiv用紙は、研究ライフサイクルのタスクを横断するフロンティアLLMとエージェントハーネスを評価するためのベンチマークのスイートを導入しました。自律的研究薬がフィールド感度、研究倫理、およびニュアンス科学的判断の制限を示す抽象的な議論。

Why It Matters

研究者は、ワークフローの長い実行を開始しますが、科学的な作業は、判断、倫理、簡単なタスクの完了でスコアが難しいコンテキストによって異なります。より良いライフサイクルのベンチマークは、エージェントが有用なアシスタントであり、人間のレビューが必須である場所を知ることができます。

Key Takeaways

01 The benchmark focus is moving beyond coding or tool use into hypothesis work, experiment planning, ethics, and interpretation.
02 Agent harnesses can improve execution while still failing on discipline-specific judgment, which is a key deployment risk.
03 Research institutions need evaluation suites that test process quality, not only final answers or leaderboard scores.
04 The near-term opportunity is assisted research acceleration; the near-term risk is over-delegating review-sensitive decisions.

Practical Points

Research leads: separate tasks agents can execute from judgments that require accountable human sign-off.

AI evaluators: include ethics, citation quality, and field-specific assumptions in agent test sets.

Product teams: expose uncertainty and decision history when marketing research-agent features to expert users.

Next action: run a small internal eval using real past research tasks and grade both outcome and reasoning trail.

Sources

Act As a Real Researcher: A Suite of Benchmarks Evaluating Frontier LLMs and Agentic Harnesses in Research Lifecycle

arXiv paper on benchmarking frontier LLMs and agentic harnesses across research lifecycle tasks.

arxiv.org →

03 Deep Dive

Amazon と NotebookLM は、日々の制作とワークフローの勉強に人工知能をプッシュします。

What Happened

Amazonは、ショッピング用のAlexaを使用してAI生成されたカスタム商品を発売し、Tシャツ、ボトル、フードなどのアイテムのデザインをユーザーに促します。 Googleは、Gemini 3.5、クラウドコンピュータ、およびソースファインディングのサポートを強化するNotebookLMをアップグレードしています。

Why It Matters

コンシューマーAIは、チャットウィンドウや埋め込まれたアクションについて、製品の作成、ソースの検索、および研究資料の管理について、より少なくなっています。勝った製品は、明確な所有権、安全、およびソース制御と利便性をペアリングします。

Key Takeaways

01 Amazon's merch feature turns prompt-to-product into a retail workflow, which tests demand for personalized AI commerce.
02 NotebookLM's Gemini 3.5 upgrade signals that source-grounded assistants are becoming mainstream study and knowledge tools.
03 Both releases reduce friction, but they also raise questions about IP, source quality, and user expectations for accuracy.
04 The common pattern is AI as an interface layer that directly triggers downstream economic or research actions.

Practical Points

Commerce teams: define IP review and moderation gates before allowing AI-generated designs to reach checkout.

Students and analysts: use NotebookLM-style tools to find and compare sources, but keep citation review manual.

Product managers: watch prompt-to-action completion rates, not only prompt volume or novelty.

Next action: audit where AI outputs can become external artifacts such as products, reports, or shared links.

Sources

Amazon is launching AI-generated custom merch

Amazon is expanding print-on-demand features to AI-generated product designs created with Alexa for Shopping.

theverge.com →

NotebookLM's Gemini 3.5 upgrade adds a cloud computer and help finding sources

Google is rolling out upgrades to NotebookLM, including Gemini 3.5, cloud-computer capabilities, and source-finding help.

theverge.com →

04.

アップルはジェミニモデルを中心に構築されたAIアーキテクチャを明らかに

AppleのAIアーキテクチャのニュースは、Appleがユーザーエクスペリエンスを所有しようとしても、デバイスAIサプライチェーンの中心でGoogleとNvidiaを維持します。

Apple reveals new AI architecture built around Google Gemini models →

05.

OpenSkill は、展開後の自己進化型エージェントを探索

ペーパーは、エージェントをデプロイする便利なリマインダーは、ベンチマーク学習ループよりもはるかに困難であるクリーンなバリファイア信号なしで適応する必要があるかもしれません。

OpenSkill: Open-World Self-Evolution for LLM Agents →

06.

MacArena は、オンライン macOS タスクでコンピューターエージェントをベンチマークします。

GUI-agentのベンチマークはより現実的になっています。これにより、チームが信頼できるデスクトップワークからデモ・レディ・オートメーションを分離するのに役立ちます。

MacArena: Benchmarking Computer Use Agents on an Online macOS Environment →

キーワード

#agentic RAG #Gemini Enterprise #Sufficient Context Agent #research agents #NotebookLM #Alexa Shopping

株式

株式詳細 →

TL;DR

市場は、製品触媒および評価リスクとしてAIを治療しています。 Apple、Nvidia、OpenAI、Tesla、SpaceXは今の同じ投資家の会話にすべてありますが、世帯のストレスとインフレの期待はマクロのバックドロップを容易に保ちます。

01 Deep Dive

アップルは、Google と Nvidia で WWDC は、顕微鏡の下で AI の実行を置く

What Happened

CNBCは、Appleが最も先進的なAIモデル戦略のためにGoogleとNvidiaと提携していることを報告しました。別々に、市場カバレッジは、AppleがWWDCでAI SiriとApple Intelligenceのアップデートを発表した後に落ちたと述べ、投資家はAI主導のデバイスサイクルのより明確な証拠を待っていることを示しています。

Why It Matters

AppleのAI戦略は、デバイス、クラウドコンピューティング、チップの需要に影響を及ぼすため重要です。 Appleが外部のAIインフラに依存し、顧客だけが機能する一方で、投資家はマージンアップサイドと戦略的な制御の両方を質問するかもしれません。

Key Takeaways

01 Google and Nvidia exposure gives Apple speed and model capability, but it also highlights dependence on outside AI infrastructure.
02 The stock reaction suggests investors want revenue catalysts, not just architecture details or feature demos.
03 Nvidia benefits from being positioned as a required supplier even for companies with strong internal silicon ambitions.
04 The risk for Apple is an expectations gap between WWDC announcements and consumer willingness to upgrade devices.

Practical Points

Apple investors: track whether AI features translate into iPhone upgrade intent, services usage, and developer adoption.

Semiconductor investors: watch Nvidia's role in Apple-related AI workloads as a validation signal for broader demand.

Product teams: treat AI partnerships as speed advantages, but keep user-facing differentiation measurable.

Next action: compare analyst estimate revisions after WWDC with actual preorder and services metrics later in the cycle.

Sources

Apple partnering with Google and Nvidia for most advanced AI model

CNBC report on Apple's AI strategy, including Google models and Nvidia chips.

cnbc.com →

Stock Market Today, June 8: Apple Falls After Unveiling AI Siri and Apple Intelligence at WWDC

Market coverage of Apple's stock reaction after WWDC AI announcements.

fool.com →

02 Deep Dive

OpenAI IPOのファイリングとSpaceXの注目がAIパブリックマーケットレースを加速

What Happened

ブルームバーグは、パブリックマーケットに向けたAIライバルレースとして、OpenAIが秘密に提出したと報告しました。 CNBCは、OpenAIの機密フィリングも報告しましたが、SpaceXのBloombergのカバレッジは、投資家はEron Muskのますます相互接続されたビジネス帝国を評価する必要があります。

Why It Matters

AI企業は、公共市場資本をインフラに供給する必要がありますが、公共投資家は、より明確にユニット経済とガバナンスを要求します。 SpaceXとOpenAIのストーリーは、スカースの成長のための投資家の食欲が集中、クロスカンパニーの暴露、収益性に関する懸念に耐えることができるかどうかもテストします。

Key Takeaways

01 A confidential OpenAI filing would make AI infrastructure spend, revenue quality, and model margins central public-market questions.
02 SpaceX's investor narrative now overlaps with Tesla, xAI, capital flows, talent, and infrastructure across Musk-linked companies.
03 AI IPO demand can become a sentiment gauge for the whole growth complex, not just one issuer.
04 The risk is that public listings force a faster repricing of private AI valuations if disclosures disappoint.

Practical Points

Growth investors: separate strategic scarcity from financial visibility when evaluating AI IPO exposure.

Private companies: prepare for investor questions on compute obligations, customer concentration, and governance before filing.

Tesla holders: watch whether SpaceX demand creates short-term portfolio rotation or broader Musk-ecosystem enthusiasm.

Next action: monitor filing disclosures for gross margin, capex commitments, and related-party dependencies.

Sources

OpenAI Filed Confidentially for IPO as Rivals Race to Market

Bloomberg report on OpenAI's confidential IPO filing and the AI public-market race.

bloomberg.com →

SpaceX IPO Forces Investors to Bet on Musk's Entangled AI Empire

Bloomberg feature on SpaceX IPO implications and the intertwined Musk company ecosystem.

bloomberg.com →

03 Deep Dive

人工知能の売り切れは含まれていますが、世帯の財政心配は7月2022以来最高に当たる

What Happened

Yahooファイナンスカバレッジは、金曜日から残忍なAIが月間チップメーカー取引に基づく短命を証明すると述べた。 CNBCは、2022年7月、ニューヨーク連邦調査において、金融に対する家庭の心配が最も高い水準に達したと別々に報告した。

Why It Matters

エクイティ投資家はAIディップを購入することを喜んでいるかもしれませんが、消費者のストレスは、より良いマクロデータなしで実行できる限りのリスクを制限します。世帯の財政が悪化すると、消費者向け企業や信用力のあるセクターの想定が高まります。

Key Takeaways

01 Chipmaker resilience suggests investors still see AI infrastructure demand as durable after sharp selloffs.
02 Household financial concern at the highest level since July 2022 is a warning that macro pressure is not just a bond-market issue.
03 Stable inflation expectations help, but deteriorating perceived conditions can still pressure spending and credit quality.
04 The risk is a split market where AI leaders recover while broader consumer and small-cap exposure weakens.

Practical Points

Portfolio managers: avoid assuming AI strength automatically confirms broad-market health.

Consumer companies: stress-test demand and financing assumptions against weaker household sentiment.

Traders: watch whether semiconductors continue to lead after macro data or only bounce from oversold levels.

Next action: pair AI exposure analysis with consumer credit, real wage, and confidence indicators this week.

Sources

Micron, Intel, Tesla, Apple, Lilly, and More Stocks That Explain Today's Market

Market coverage discussing chipmaker trading after a sharp AI selloff.

finance.yahoo.com →

Household worries over finances hit highest level since July 2022, New York Fed survey shows

CNBC report on New York Fed survey results showing elevated household financial worries.

cnbc.com →

04.

NvidiaのCEOは、AI、中国、および輸出制御に関するセンテート証言を低下させます

アイテムは、特に中国の暴露と輸出制御のスルチニーの周りに、AIチップ投資家のために見えるポリシーリスクを保持します。

Nvidia CEO Jensen Huang declines Senate testimony on AI, China and exports →

05.

テスラがSpaceXの金曜日IPOの先を上回る

投資家がポートフォリオの回転や生態系の熱意を期待するとき、Teslaの移動はMuskリンクされた資産がどのように一緒に取引できるかを示しています。

Tesla Stock Rises Ahead of SpaceX's Friday IPO →

06.

ペン・ステーションは、連邦の資金調達で10億の目を再ハブ

インフラの資金調達は、請負業者、自治体の優先順位、地域開発に影響を与える連邦の決定で、別の市場テーマを維持します。

New York's Penn Station Rehab Eyes Billions in Federal Funding →

キーワード

#Apple #Nvidia #OpenAI IPO #SpaceX IPO #Tesla #AI selloff #New York Fed

暗号資産

暗号資産詳細 →

TL;DR

暗号は、流出、マクロ圧力、およびDeFiセキュリティストレスに対して、機関の蓄積ストーリーをバランス良くしています。ビットコインはETFフローやインフレの期待に非常に敏感であり、NFTおよびレンディングインシデントは、運用リスクが資産クラスのまだ一部であることを示しています。

01 Deep Dive

Yuga Labsは、フローリングプロトコルの悪用後、68以上のNFTを$500,000以上削減

What Happened

Yuga Labsは、GrailsOTCのトレーディングデスクを使用して、脆弱なフローリングプロトコルプールから50万ドルを超える価値のある68個のブルーチップNFTを救助しました。復号化はまた、ボルド・エイプ・ヨット・クラブ・クリエーターが60以上の救助されたイーサリアムNFTを、それらを返すために働いている間、それを報告しました。

Why It Matters

救助は即時のユーザーの損失を制限しますが、それはまた信頼できるチームによる速い介入によってNFTの市場の安全が依然量によって決まります示します。これは、ガバナンスの緊張を生み出します。ホワイト・ハット・レスキューは有用ですが、中枢的な分散型市場における対応のポイントを明らかにしています。

Key Takeaways

01 The 68-NFT rescue above $500,000 prevented a larger exploit outcome, but it did not remove the underlying protocol-risk lesson.
02 Blue-chip NFT liquidity can still be exposed through third-party financialization layers such as lending, pooling, or floor protocols.
03 Yuga's custody role may reassure holders in the short term while raising questions about rescue procedures and return verification.
04 The risk is copycat exploitation if vulnerable protocols are not patched before attackers inspect the same failure pattern.

Practical Points

NFT holders: review approvals and exposure to pooling or lending protocols, not just wallet custody.

Protocol teams: publish a clear incident timeline, patch status, and user-claim process after white-hat rescues.

Marketplaces: flag assets tied to active exploit recovery so buyers understand custody and return status.

Next action: revoke unused NFT approvals and monitor official Yuga and Flooring Protocol recovery instructions.

Sources

Yuga Labs Executes White-Hat Rescue of 68 NFTs After Flooring Protocol Exploit

The Defiant report on Yuga Labs rescuing 68 NFTs valued at more than $500,000 after a Flooring Protocol exploit.

thedefiant.io →

Bored Ape Maker Yuga Labs Rescues Dozens of Ethereum NFTs From Exploit

Decrypt report on Yuga Labs holding rescued Ethereum NFTs in custody while working to return them to owners.

decrypt.co →

02 Deep Dive

スポットビットコインETFは、$ 60,000の面積を保持するためにBTCの戦いとして$ 1.7Bを失う

What Happened

Cointelegraphは、ビットコインETFがアウトフローが4週間に達したため、アウトフローの$ 1.7億を見たことを報告しました。他の市場カバレッジは、Bitcoinの$ 60,000サポートはまだ、戦略関連の販売だけでなく、Coinのリンクデスクの弱さがインフレの懸念を積み重ねたマクロヘッドウィンドとして確保されていないと述べた。

Why It Matters

ETF の流れは、Bitcoin の最も明確な機関の要求のゲージの一つになりました。持続的なアウトフローは、マクロ投資家がインフレやレートの期待がリスクアセットに対して移動したときにすぐに暴露を減らすことができるため、$ 60,000エリアをより脆弱にします。

Key Takeaways

01 $1.7 billion of spot Bitcoin ETF outflows over a four-week streak points to sustained institutional de-risking.
02 The $60,000 support zone matters psychologically because a clean break would challenge the post-ETF demand narrative.
03 Inflation and CPI expectations can now dominate crypto-specific explanations for Bitcoin weakness.
04 The risk is forced narrative rotation: bullish treasury purchases may not offset broad ETF selling if macro pressure persists.

Practical Points

Bitcoin investors: track ETF net flows and real yields together instead of reading price action in isolation.

Traders: define invalidation levels around the $60,000 area before CPI-related volatility arrives.

Treasury buyers: keep liquidity reserves because institutional outflows can widen drawdowns even when the long thesis is intact.

Next action: watch whether ETF flows stabilize before adding leverage to Bitcoin rebound trades.

Sources

Spot Bitcoin ETFs bleed $1.7B as outflow streak hits four weeks

Cointelegraph report on spot Bitcoin ETF outflows and broader crypto fund flows.

cointelegraph.com →

Blame bitcoin's tumble on rising inflation, not Strategy, 10xResearch argues

Coindesk coverage of 10x Research analysis linking Bitcoin weakness to inflation and ETF selling.

coindesk.com →

03 Deep Dive

戦略は、BitMineが$ 214M ETHを販売オフに買いながら$ 100M BTCを追加します

What Happened

Coindesk のライブカバレッジは、ビットコインは、ストラテジーとして$ 63,000を上回ったと述べましたビットコインは、その最新の購入でBTCに100万ドルを追加しました。 Decryptは、Tom Lee's BitMineがEthereumで2,14百万ドルを買ったと報告しました。今年は最大の週単位のETH購入で、DecryptはJPMorganの戦略のキャッシュポジションが投資家の落ち着きに重要であることを指摘しました。

Why It Matters

コーポレート・クリプト・トレリーズは、蓄積機会としてドローダウンをフレーム化しようとしています。問題は、投資家が買い手が十分な現金を持っていると信じている場合だけ、大規模な購入サポートの感情である, リスクコントロール, より深いボラティリティを生き残るために忍耐.

Key Takeaways

01 Strategy's $100 million BTC purchase reinforces the corporate-treasury accumulation narrative during weakness.
02 BitMine's $214 million ETH buy shows dip-buying is extending beyond Bitcoin into Ethereum treasury strategies.
03 JPMorgan's focus on Strategy's cash position highlights that balance-sheet resilience matters as much as coin count.
04 The risk is concentration: treasury companies can amplify upside narratives but also become volatility transmission channels.

Practical Points

Equity investors: evaluate crypto-treasury stocks on liquidity, debt, and dilution risk, not only token holdings.

Crypto traders: treat treasury buys as sentiment inputs, but confirm with ETF flows and spot liquidity.

Corporate treasurers: avoid copying aggressive accumulation without matching cash runway and governance controls.

Next action: compare treasury purchase announcements with balance-sheet disclosures and market liquidity conditions.

Sources

Live updates: Bitcoin tops $63,000 as Strategy adds $100 million BTC in latest purchase

Coindesk live coverage of Bitcoin price action and Strategy's latest BTC purchase.

coindesk.com →

Tom Lee's BitMine Buys the Dip Amid 'Superficial' Crypto Selloff, Adding $214M in Ethereum

Decrypt report on BitMine's $214 million Ethereum purchase during the crypto selloff.

decrypt.co →

04.

Aaveチーフは$ 8.45B銀行の操業の後で議定書を保護します

エピソードは、特にサードパーティの依存関係、流動性出口、およびストレスイベント後の説明責任を中心に、DeFiリスク管理を維持します。

Aave chief defends protocol's 'resilience' after $8.45 billion bank run →

05.

ベルンスタインはまだ小売退屈にもかかわらず$ 150Kビットコインを見ています

強靭なターゲットのコントラストは、弱な感情と短期小売の注意から、機関の研究がどのように掘り下げることができるかを示しています。

Bitcoin Is 'Boring' AI-Hungry Retail Investors, But Bernstein Still Sees $150K This Year →

06.

ビットコインの蓄積は、マイナスのリスクにもかかわらず、持続します

累積引数はまだ生きていますが、マクロ主導のドローダウンが保持者よりも長く続く可能性があるため、分散サイジングに依存しています。

'Best thesis' for Bitcoin accumulation surfaces despite current downside risk: Analyst →

キーワード

#Yuga Labs #Flooring Protocol #Bitcoin ETFs #$60,000 support #Strategy #BitMine #Aave

Google は、最大 34% の高い現実性で Gemini Enterprise にエージェント RAG を追加

Google Research Adds Agentic RAG to Gemini Enterprise Agent Platform with a Sufficient Context Agent for multi-hop queries

研究用試薬は、フルサイエンスのライフサイクルにわたってテストフロンティアモデルをベンチマーク

Act As a Real Researcher: A Suite of Benchmarks Evaluating Frontier LLMs and Agentic Harnesses in Research Lifecycle

Amazon と NotebookLM は、日々の制作とワークフローの勉強に人工知能をプッシュします。

Amazon is launching AI-generated custom merch

NotebookLM's Gemini 3.5 upgrade adds a cloud computer and help finding sources

アップルはジェミニモデルを中心に構築されたAIアーキテクチャを明らかに

OpenSkill は、展開後の自己進化型エージェントを探索

MacArena は、オンライン macOS タスクでコンピューター エージェントをベンチマークします。

アップルは、Google と Nvidia で WWDC は、顕微鏡の下で AI の実行を置く

Apple partnering with Google and Nvidia for most advanced AI model

Stock Market Today, June 8: Apple Falls After Unveiling AI Siri and Apple Intelligence at WWDC

OpenAI IPOのファイリングとSpaceXの注目がAIパブリックマーケットレースを加速

OpenAI Filed Confidentially for IPO as Rivals Race to Market

SpaceX IPO Forces Investors to Bet on Musk's Entangled AI Empire

人工知能の売り切れは含まれていますが、世帯の財政心配は7月2022以来最高に当たる

Micron, Intel, Tesla, Apple, Lilly, and More Stocks That Explain Today's Market

Household worries over finances hit highest level since July 2022, New York Fed survey shows

NvidiaのCEOは、AI、中国、および輸出制御に関するセンテート証言を低下させます

テスラがSpaceXの金曜日IPOの先を上回る

ペン・ステーションは、連邦の資金調達で10億の目を再ハブ

Yuga Labsは、フローリングプロトコルの悪用後、68以上のNFTを$500,000以上削減

Yuga Labs Executes White-Hat Rescue of 68 NFTs After Flooring Protocol Exploit

Bored Ape Maker Yuga Labs Rescues Dozens of Ethereum NFTs From Exploit

スポットビットコインETFは、$ 60,000の面積を保持するためにBTCの戦いとして$ 1.7Bを失う

Spot Bitcoin ETFs bleed $1.7B as outflow streak hits four weeks

Blame bitcoin's tumble on rising inflation, not Strategy, 10xResearch argues

戦略は、BitMineが$ 214M ETHを販売オフに買いながら$ 100M BTCを追加します

Live updates: Bitcoin tops $63,000 as Strategy adds $100 million BTC in latest purchase

Tom Lee's BitMine Buys the Dip Amid 'Superficial' Crypto Selloff, Adding $214M in Ethereum

Aaveチーフは$ 8.45B銀行の操業の後で議定書を保護します

ベルンスタインはまだ小売退屈にもかかわらず$ 150Kビットコインを見ています

ビットコインの蓄積は、マイナスのリスクにもかかわらず、持続します

MacArena は、オンライン macOS タスクでコンピューターエージェントをベンチマークします。