Critics Zero In on the Math Where $1 of AI Revenue Costs Platforms $14

01Selling $1 of AI Revenue Can Cost the Platforms $14, and That Math Is Where Critics Now Aim

Cory Doctorow's new book, "The Reverse Centaur's Guide to Life After AI," argues that deflating the AI bubble means attacking its foundation, not its marketing. The science-fiction author and tech journalist inverts the usual image of AI as a tool that augments a worker. In his framing the worker instead paces a machine that sets the rhythm. His chosen pressure point is the economics underneath the hype.

That argument arrives next to an independent cost analysis that dates the skepticism earlier than most accounts do. A widely shared blog post at blog.dshr.org credits Sequoia Capital's David Cahn with the first warning its author could find: a September 2023 piece titled "AI's $200B Question." Cahn re-ran the same analysis nine months later as "AI's $600B Question." His estimate of the revenue gap had tripled in under a year.

The post argues the skeptics were not outliers. It notes that independent journalist Ed Zitron flagged the same gap before the mainstream business press did. The hard part now, the author writes, is keeping up with the volume of companies complaining about what their employees spend on tokens.

The numbers that connect these signals point at unit economics rather than sentiment. Estimates of the per-dollar subsidy vary widely, the post reports, typically landing between $8 and $14 of cost for every $1 of revenue generated. Two posts from Zitron sharpen the range. In "AI's Brokenomics," Zitron reported that SemiAnalysis, which he describes as an extremely pro-AI semiconductor analyst, ran random long-horizon coding tasks until they hit the limits on each subscription tier.

The implication, subject to the assumption that platforms are not subsidizing the token price itself, is severe. By Zitron's reading the heaviest enterprise users get subsidized up to 40 times at one platform and up to 70 times at another. A subsidy that scales with usage means more adoption widens the loss rather than closing it.

That is the through-line linking Doctorow's book to the spreadsheets. The dispute is not about whether AI is useful. It is about whether the cost of delivering it can fall faster than usage grows. The author signals the post is a first sample from a flood of cost data still arriving.

Heaviest AI users may cost platforms 40-70x their revenuesubsidy scales with adoption, so growth widens losseswatch whether per-token cost falls faster than enterprise usage climbs

Sources

How to burst the AI bubble: Strike at its rootsarstechnica.com AI's Affordability Crisisblog.dshr.org

02A 600-character signature where Claude's reasoning should be

Over a weekend, a developer went looking for the logic behind his agent's work. Claude Code records each session to disk, and those logs include "thinking blocks," billed as the model's reasoning as it runs. He opened them expecting text and found only a 600-character signature, no words.

So he read the documentation. Claude encrypts its reasoning into that signature, and Anthropic holds the key; the local machine never receives it. What the API returns is a summary of the reasoning, not the reasoning itself. The full thinking output requires an enterprise agreement, according to his reading of Anthropic's docs.

The gap matters to anyone who has promised a client an audit trail. Reasoning files on your disk are not readable by you. You can scrape a running session's inputs, outputs, and actions, but the logic driving them stays sealed. The "extended thinking" surfaced by ctrl+o is a summary of the model's thinking, he writes, not the actual reasoning that produced its actions. He likens the conversion to saving a BMP as a JPEG, editing the JPEG, then writing it back: data is lost each pass.

A separate writeup pushes the doubt one layer down, reframing prompt injection as "role confusion." An LLM receives system prompts, user messages, tool outputs, and its own prior responses as one continuous string, the authors argue. The chat interface shows tidy, separate turns; the model sees a single block of text. Edit the string and you edit the model's reality. A deleted turn never happened.

Both findings land on the same soft spot: a model's account of its own internals is not load-bearing. The reasoning log you save locally is an encrypted summary, not the logic itself. The boundary you assume separates trusted instructions from injected text is, by the paper's account, a line the model itself draws unreliably.

Local thinking logs are encrypted summaries, not an audit trailfull reasoning output gated behind enterprise agreementsrole boundaries between system prompts and injected text not reliably enforced

Sources

The text in Claude Code's "Extended Thinking" outputpatrickmccanna.net Prompt Injection as Role Confusionrole-confusion.github.io

03OpenAI showcased GPT-5 cracking a 3-year immunology case. A new benchmark wants AI to clear 90 Nature papers first

OpenAI published an account of GPT-5 Pro helping immunologist Derya Unutmaz resolve a question about T cell behavior that had stalled his lab for three years. The company presents it as a single case where the model produced an insight a working scientist could act on. OpenAI says the result could support cancer and autoimmune research, stopping short of any clinical claim. One named scientist, one problem, one curated win.

NatureBench arrives as the counterweight. It is a cross-discipline benchmark of 90 tasks distilled from peer-reviewed Nature-family publications, and its authors built it to test whether AI coding agents can move past reproduction toward discovery on real scientific problems. The framing is a direct challenge to the showcase format. A model that impresses one immunologist in conversation has not been measured against anything; a model that matches the published state of the art across 90 papers has.

The benchmark also targets a problem that has dogged earlier agent-on-research evaluations: the environments don't hold up. NatureBench runs on NatureGym, an automated pipeline that constructs a standardized, containerized environment for each task from its source paper. The authors say this addresses the environment-fragmentation issue that, in their account, limited the credibility of prior agent benchmarks. The pitch is reproducibility as a precondition for any claim about capability.

The two items measure different things. OpenAI's case is conversational scientific reasoning with a human in the loop. NatureBench scores autonomous coding agents against a fixed, published bar. They are not the same experiment, and neither settles what AI can do for science. What separates them is the standard of proof: a vendor's selected anecdote on one side, a systematic test someone else can rerun on the other.

NatureBench's headline question is whether agents can beat reproduction and match the SOTA. The paper introduces the test; it does not report a model clearing it. Until those scores land, the showcase and the benchmark are two arguments about evidence, pointed at the same claim.

Vendor science demos now face a rerunnable 90-task counter-benchmarkreproducibility, not anecdote, becomes the bar for "AI does science"watch whether any agent's NatureBench SOTA score ever publishes

Sources

How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mysteryopenai.com NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?huggingface.co

Anthropic launched Claude Tag, an always-on assistant inside Slack Anthropic released Claude Tag, which reads Slack messages to build a model of a company's workflows and institutional knowledge. The feature embeds Claude into enterprise context and recurring tasks. techcrunch.com

Corporate AI super PACs spent $27 million on one New York Assembly race Industry-backed super PACs poured $27 million into the 12th District contest involving Alex Bores, who authored state AI safety legislation. The spending dwarfs typical budgets for a local seat. theverge.com

Nvidia said its Rubin data center design cuts nearly all water use Nvidia claimed its Rubin-generation reference design for fully liquid-cooled data centers eliminates most power overhead and almost all water consumption. The pitch answers public pushback over AI facilities' resource demands. theverge.com

OpenAI backed the Appia Foundation to set AI evaluation standards OpenAI said it is supporting the Appia Foundation to develop shared evaluation frameworks and safety practices for advanced AI. The effort targets cross-company cooperation on testing methods. openai.com

Figma added code layers, motion support, and AI-built plug-ins Figma shipped an update introducing a code layer, support for animation and shaders, and the ability to generate custom plug-ins with AI. The changes extend the design tool toward interactive and motion work. techcrunch.com

Midjourney announced a water-immersion ultrasound scanner with no published evidence Midjourney, known for image generation, unveiled an ultrasound body scanner that submerges users and claims MRI-level results. The Verge found the company has released no data supporting the medical claims. theverge.com

Researchers released EnterpriseClawBench from real workplace agent sessions The benchmark builds 852 reproducible tasks from proprietary enterprise agent sessions, each paired with recovered fixtures, role classes, hard rules, and semantic rubrics. It tests agents on reading mixed files, calling tools, and producing business artifacts. huggingface.co

Google released the Google Home Speaker with reliable wake-word detection Google's new smart speaker heard "Hey, Google" even at full volume during The Verge's testing. Reviewers found it sounds good but behaves inconsistently in daily use. theverge.com

Sony's Xperia 1 VIII AI Camera Assistant produced poor photos in testing Sony promoted the phone using images shot with its new AI Camera Assistant. After a week of use, The Verge reported the feature delivered some of the worst results seen from a Sony camera. theverge.com