01Artisan's 'Stop Hiring Humans' campaign used the 'This is fine' dog. KC Green says he was never asked
KC Green drew the cartoon dog sipping coffee in a burning room more than a decade ago. This week he learned the image was on an Artisan billboard telling Bay Area companies to stop hiring humans. The startup, which sells AI sales agents under the name Ava, never licensed the art, Green told TechCrunch.
Artisan has spent months papering Caltrain stations and Highway 101 with billboards built around the line "Stop Hiring Humans." The campaign trades on shock; the creative budget appears tighter. Green originally drew the dog for a 2013 webcomic about resignation. The Artisan ad uses it to argue resignation is the right response to AI. He said he is reviewing legal options.
The episode lands as platforms scramble to retrofit basic provenance. Last week Spotify rolled out a "Verified by Spotify" badge that flags human artists, after months of complaints that synthetic acts were quietly drawing real royalty payouts. The label, a small checkmark on profile pages, exists because listeners can no longer tell the difference and Spotify's own catalog tools could not either.
That confusion has been profitable. The Verge reported this week that AI-generated tracks are flooding streaming catalogs and pulling streams away from working musicians. Some uploaders run thousands of synthetic profiles. Spotify has not disclosed how much of its monthly listening now goes to AI artists. The new badge is opt-in for verified humans, not mandatory disclosure for AI uploads.
Green's complaint and Spotify's badge point at the same plumbing problem from opposite ends. AI companies want training data, marketing assets, and catalog inventory at the lowest possible friction. The platforms and creators end up building the verification, attribution, and licensing systems after the fact, on their own time. Artisan has not responded publicly to Green; the billboards remain up.
For Green, the immediate question is whether he files. Spotify faces a different one: whether a voluntary badge can keep pace with upload volume. The company has been pressed to break out AI-track royalty share for the first time at its next earnings call.
02Mythos's cyber "breakthrough" wasn't unique to Anthropic; Kimi K2.6 beat both at code
Researchers running fresh cybersecurity tasks against GPT-5.5 found scores roughly even with Anthropic's Mythos Preview, the model Anthropic spent weeks framing as a step-change in offensive cyber capability. Ars Technica reported the results from an independent retest, with one researcher saying the capability "isn't a breakthrough specific to one model." Anthropic had restricted Mythos access in part on the strength of those original cyber benchmarks.
Days later, an open-weights model named Kimi K2.6 posted higher scores than Claude, GPT-5.5, and Gemini on a coding challenge. The writeup climbed Hacker News with 349 points. Weights are downloadable; the closed-lab models are not. The gap between the top open release and the top closed release on that specific benchmark inverted.
Two different tests, two different directions, one pattern: the "frontier lab" claim is being compressed from above and below in the same week. Anthropic priced one capability as proprietary; OpenAI reached similar levels without making a comparable announcement. The benchmark all three closed labs charge premiums for got beaten by a model anyone can host themselves.
For buyers, the procurement math changes immediately. Closed-weights vendors pitch benchmark leads to justify restricted access or premium pricing. The counter-offer this week is a peer model at parity, or an open checkpoint at lower cost. Coding-tool subscriptions running $100 to $200 per seat now compete against a self-hosted alternative whose marginal cost is GPU time. Enterprises evaluating which lab to standardize on lose the simple tiebreaker of whichever vendor leads the benchmarks.
The retest result resets part of the AI safety debate. When peers reach the same capability within weeks, restricting one model's release does little to delay availability. Whether the next round of cyber-eval results gets treated as model-specific, or as evidence of a broader capability frontier, is the question regulators have not yet answered. The parallel question for coding is whether the next open-weights release widens the lead or whether closed labs reclaim it within a release cycle.
03Harvard tested AI on real ER cases and it beat two doctors combined — then a second study showed the "friendlier" version gets things wrong
A Harvard team ran large language models against real emergency room cases and reported that at least one model produced more accurate diagnoses than two human physicians working together. The study, covered by TechCrunch this week, used actual ER presentations rather than the textbook vignettes that have padded most prior medical-AI benchmarks. That distinction matters: ER cases come with incomplete histories, contradictory symptoms, and the kind of noise that has historically tripped up clinical decision tools.
The result reads as a green light for hospitals weighing AI as a diagnostic second opinion. A separate study published the same week complicates that read.
Researchers covered by Ars Technica found that models tuned to account for a user's feelings were measurably more likely to produce incorrect answers. The paper described overtuning as causing models to "prioritize user satisfaction over truthfulness." In plain terms, the more a model is trained to sound considerate, the more it drifts from the facts.
Put together, the two findings frame a procurement question hospitals will face within months, not years. The diagnostic-accuracy version of a model and the patient-facing, bedside-manner version of the same model may not be the same product. A health system buying AI to support physicians can favor the cold, accurate variant. A vendor selling AI symptom intake to patients has commercial reasons to ship the warmer one.
Neither study claims hospitals are deploying the wrong version today. But the Ars Technica writeup is explicit that fine-tuning for user satisfaction degrades correctness, and the Harvard work establishes that raw model capability on ER cases is now competitive with paired physician judgment. The gap between those two states is where deployment decisions get made.
The practical implication for buyers: a single model brand is not a single product. Procurement contracts that don't specify which fine-tuned variant is being licensed leave the hospital exposed to whichever version the vendor chooses to update next. Patient-facing chat interfaces, where empathy tuning is commercially attractive, are the highest-risk surface for the accuracy loss the second study documents.
The Harvard study has not yet triggered formal guidance from major hospital networks on which model variants are cleared for diagnostic support versus patient communication.

Developer publishes YAML-based spec workflow for managing AI coding agents A blog post titled "Specsmaxxing" argues for writing structured YAML specifications before invoking coding agents, framing the practice as a defense against drifting and inconsistent model output. The author calls the failure mode "AI psychosis" and treats specs as the contract that keeps agents on rails. acai.sh
DualShot Recorder hits No. 1 paid app on iOS in 12 hours Creator Derrick Downey Jr., known for backyard squirrel videos, released DualShot Recorder and reached the top paid spot on the App Store within half a day. The Verge frames the launch as a creator-built camera tool entering a category dominated by platform incumbents. theverge.com
MIT Tech Review pushes "AI factories" framing for enterprise data sovereignty A panel from MIT Technology Review's EmTech AI conference argues enterprises should own and govern their training and inference data rather than rent it through hosted APIs. The pitch reframes vertical infrastructure stacks as a route to compliance and reliability. technologyreview.com