An AI opened a store and forgot to staff it. Opus 4.7 dethroned everyone. AmEx insures your agent's purchases. Apple nearly booted Grok. Robots learned to wiggle. Nine stories.
An AI signed a lease in San Francisco, opened a store, hired two humans, and forgot to schedule either of them. Opus 4.7 retook the coding crown. AmEx became the first institution to insure agent purchases. Apple nearly booted Grok over deepfakes. Harvard taught robot swarms to wiggle. Cerebras launched a $2B IPO roadshow at $25B. Nine stories, five fundings, one Hong Kong IPO.
Anthropic Dropped Opus 4.7 on a Wednesday and Ended Three Careers' Worth of Benchmark Bragging.
on SWE-bench Verified, comfortably past Gemini 3.1 Pro's 80.6% and up from Opus 4.6's 80.8%. On SWE-bench Pro — the test that actually hurts — Opus 4.7 put up
while GPT-5.4 managed 57.7% and Gemini sat at 54.2%. CursorBench, the benchmark real developers care about, jumped from 58% to
. Tool errors dropped by two-thirds. And Anthropic held the price flat at $5/$25 per million tokens, which is the kind of pricing discipline that turns proof-of-concepts into production overnight. The one blemish: BrowseComp fell to 79.3%, trailing GPT-5.4. You can't be best at everything — but you can be best at the things that ship code.
The leapfrog cycle is measured in weeks now, not quarters. Anthropic delivered a 14% improvement on multi-step agentic workflows while holding the line on cost — that's the kind of math that turns enterprise POCs into production deployments overnight. If your AI vendor is raising prices while the competition is raising capability, you have your answer.
An AI Signed a Lease, Opened a Store, Hired Two Humans — Then Forgot to Schedule Anyone for Opening Day.
Andon Labs gave an AI agent named Luna a
budget, a three-year retail lease at 2102 Union St in San Francisco's Cow Hollow, and full operational autonomy. Luna selected inventory — including copies of
, because of course — negotiated with suppliers, set prices, and hired two full-time employees named John and Jill. Reportedly the world's first workers with an AI boss. Then opening day arrived and Luna hadn't scheduled either of them. The store launched to an empty register. Luna runs on Anthropic's Claude Sonnet, and all human employees are formally employed by Andon Labs with guaranteed pay and full legal protections. The experiment isn't a gimmick — it's the most honest stress-test of agentic commerce we've seen: give an agent real money, real consequences, and a real lease, and watch where it breaks.
Luna can negotiate supplier contracts but can't schedule a shift. That gap between strategic reasoning and operational memory is the unsolved problem in every enterprise agent deployment. The book selection, though? Immaculate taste.
AmEx Will Cover the Tab When It Screws Up.
American Express unveiled its Agentic Commerce Experiences (ACE) Developer Kit on April 14 and made a promise that would've sounded insane a year ago: if a registered AI agent makes an erroneous purchase on your behalf,
. The ACE framework provides five core services — agent registration, account enablement, intent intelligence, payment credentials, and cart context — designed so an agent's identity is verified, the cardholder has authorized the activity, and every transaction has an audit trail. Payment partners include
. This is the trust layer that agentic commerce has been missing. Agents could already browse, select, and transact. What they couldn't do was get someone to guarantee the downside. Now they can.
The technology for agentic commerce existed. The missing piece was liability. AmEx just gave every enterprise procurement team the cover they needed to go from pilot to production.
Apple Almost Kicked Grok Off the App Store.
3 Million Deepfakes and 23,000 Images of Minors Later, xAI Barely Survived.