Amazon Engineers Mock Their AI Coder as 'Sloppenheimer'; Judge Tosses Lawyers Over AI Filings

01Amazon engineers named their internal AI coding tool 'Sloppenheimer.' OpenAI was busy publishing the opposite story.

OpenAI's pitch lands in the marketing register. In a customer write-up, the company says engineers at Nextdoor use Codex with GPT-5.5 to chase down hard-to-reproduce bugs, ship across platforms, and "build without limits." The framing is productivity unbound: the tool investigates, the human focuses on the product.

Inside Amazon, engineers tell a different version. According to 404 Media, employees created a Slack channel devoted to memes mocking the company's own AI coding product, trading complaints about its failures. One nickname stuck: "Sloppenheimer," a play on "slop," the term developers use for low-quality machine-generated output. The channel exists to commiserate, the report says, not to celebrate.

The gap between those two accounts is not about benchmarks. It is about who absorbs the cost when generated code lands in a repository. Vendor case studies measure the moment of creation: a fix written, a feature scaffolded, a ticket closed. The complaints from the floor measure what comes after, when someone else has to read, run, and maintain the result.

That after-cost has a longer pedigree than any model. A blog post titled "Cleaning up after AI rockstar developers," which drew 436 points on Hacker News, maps the AI workflow onto an older archetype. The author describes the classic rockstar engineer who rewrites core architecture, introduces new languages, rejects most pull requests, and writes code nobody else understands. The work always sounds impressive. Then the rockstar leaves.

What remains is the part the case study never shows. The author recounts inheriting such a project and finding the data flow so tangled it read like someone covering up a murder. Fixing one simple bug started with a week spent just getting the code to run locally. Half of it was written in a language he didn't know.

The parallel is the argument. AI tools generate at rockstar speed and, like the rockstar, leave no one who fully understands the output. Nextdoor's engineers report building faster. Amazon's report cleaning up. Both can be true at once, because they measure different ends of the same pipeline.

For a team deciding whether to adopt, the useful question is not how fast the tool writes. It is who on the roster will own the code six months later, and whether that person was counted in the productivity math.

Adoption math counts code written, not code maintaineddownstream maintainers and reviewers absorb the rework costvendor case studies measure creation, internal channels measure cleanup

Sources

How engineers at Nextdoor use Codex to build without limitsopenai.com 'Sloppenheimer:' Amazon Employees Mock the Company's AI on Slack404media.co Cleaning up after AI rockstar developerscodingwithjesse.com

02Free AI Becomes the Distribution Play: Apple Waives Cloud Bills, Google Bundles Live Translation

Two platform giants spent this week giving away artificial intelligence that competitors are still trying to sell. The product lines differ. The accounting is the same.

Apple will waive cloud API costs for developers with fewer than 2 million first-time App Store downloads, according to TechCrunch. The bet targets the builders who experiment most and pay least. As model inference grows more expensive to run at scale, Apple is absorbing the bill for the long tail of its developer base rather than metering it. The price of that subsidy buys something harder to acquire: developers who default to Apple's cloud stack before they grow large enough to shop around.

The threshold matters more than the discount. A developer under 2 million downloads is precisely the cohort deciding which platform to build on. Lock them in early, and the free tier becomes the on-ramp to paid usage once an app crosses the line. Apple keeps the gate; the developer keeps building.

Google is running the same logic at the other end of the pipe. Gemini 3.5 Live Translate brings near real-time speech translation into Google Translate, Google Meet and Google AI Studio, the company says. Two of those three are mass-market products already open on hundreds of millions of phones and work calls. The third is the developer console. Google did not launch a paid translation tier and a marketing push around it. It dropped the feature inside tools people already use.

That placement is the strategy. A real-time translator sold as a standalone product competes on price against every other translation API. The same translator embedded in Meet competes against nothing, because the user never chose it. They opened a video call and the captions were multilingual.

The two moves point at different customers. Apple courts the developer; Google courts the end user and the meeting. Both treat the AI feature as bait for the platform, not the revenue line. When direct sales of model access stop clearing margin, distribution becomes the asset worth subsidizing, and whoever controls the entry point collects later.

Sub-2M-download developers now build on Apple's cloud for freelive translation ships free inside Translate, Meet, and AI Studiostandalone AI-API vendors lose pricing power as giants bundle the same features at zero

Sources

Apple bets cheaper AI will woo small developerstechcrunch.com Fluid, natural voice translation with Gemini 3.5 Live Translatedeepmind.google

03Both Sides Filed AI Work. The Judge Canceled the Trial and Removed Every Lawyer.

The judge did not lose the case to one bad lawyer. The judge lost it to both of them.

According to 404 Media, a judge presiding over a trial discovered that the lawyers on each side had used AI to prepare their work, and that neither side had checked what the tools produced. The response was not a fine or a warning. The judge canceled the trial outright and removed every attorney from the case, leaving the litigants to start over with new counsel.

That sequence inverts the usual AI-in-court story. The familiar version has one careless lawyer submitting a brief with invented citations and getting caught by an alert opponent. Here the adversarial check failed on both ends. When each side outsources its filings to a model and trusts the output, there is no human left in the room to catch the other's errors. The opposing counsel who would normally flag a fabricated citation was running the same playbook.

The collapse points past courtroom etiquette to a measurement problem that researchers are now trying to name. A paper titled "Agents' Last Exam," posted this month with 54 upvotes on Hugging Face, argues that recent AI systems post strong scores across a wide range of benchmarks, yet those gains have not converted into economically meaningful deployment in professional fields. The authors frame the shortfall as an evaluation failure: standard benchmarks do not measure sustained performance on the long-horizon, real-world workflows that actually pay.

Legal filing is exactly that kind of workflow. A model can pass a bar-style question set and still produce a brief no court will accept, because the job is not answering a prompt once. It is assembling a record that holds up under an opponent's scrutiny and a judge's reading. The benchmark rewards a single correct answer; the courtroom punishes a single fabricated one.

The new benchmark proposes scoring agents on those sustained, economically valuable tasks instead. Whether that closes the gap is unsettled. What is settled is the cost in this case: a vacated trial, a fresh set of lawyers, and a docket that has to begin again.

Both-sides AI use breaks the adversarial check courts rely onbenchmark scores don't predict reliable professional outputlawyers face removal, not just fines, for unverified filings

Sources

Judge Learns Lawyers on Both Sides of Case Used AI, Cancels Trial, Kicks Everyone Off the Case404media.co Agents' Last Examhuggingface.co

Anthropic releases Claude Fable 5, its most powerful widely available model Anthropic launched Claude Fable 5, which it calls the strongest model it has put into broad release. The company said Fable 5 leads rivals on software engineering, knowledge work, and vision, with its margin widening on longer, more complex tasks. theverge.com

Lovable hits $500M annualized revenue with 1 million weekly projects Lovable reached $500 million in annualized run-rate revenue, with users starting roughly 1 million new projects each week. The company said customers are building businesses and replacing internal software with its app-generation tools. techcrunch.com

Notion builds web Voice Input with OpenAI's Codex Notion used OpenAI's Codex to turn specs into working code in single passes and shipped its web AI Voice Input feature with it. OpenAI's case study describes small engineering teams shipping more per person. openai.com

GM activates vehicle-to-grid to offset AI data center power demand General Motors switched on vehicle-to-grid capabilities for current EV and home energy customers at a San Francisco event. GM pitched EV and sodium-ion batteries as grid storage against rising electricity demand from AI data centers. theverge.com

Apple adds generative AI photo editing at WWDC 2026 Apple announced AI photo editing tools in iOS 27, reversing its prior position against generative edits that distort images. The tools let users manipulate photos directly, drawing deepfake concerns. theverge.com

Google DeepMind opens a European robotics accelerator Google DeepMind launched a three-month accelerator for early-stage European robotics startups. The cohort gets access to DeepMind's AI stack and Gemini robotics models, spanning logistics, manufacturing, healthcare, and navigation. deepmind.google

Sandstone raises $30M for in-house legal AI Sandstone closed a $30 million Series A to build AI tools for corporate legal departments. The round arrived six months after a Sequoia-led seed. techcrunch.com

Ex-Spin founder raises $5M for space data centers Euwyn Poon, who built 250,000 scooters at Spin, raised $5 million for Orbital to put data centers in space. Poon plans to launch 10,000 orbital data centers. techcrunch.com

DeepMind reports learning gains from Gemini tutoring in Sierra Leone Google DeepMind published randomized controlled trial results for Gemini's Guided Learning feature. The trial found higher student engagement and faster learning in Sierra Leone. deepmind.google

Claude Fable 5 generates playable games from one prompt Anthropic's Claude Fable 5 produces working video games from a single prompt, TechCrunch reported. The feature targets the web's vibe-coding users. techcrunch.com