OpenAI Hands Health Questions to GPT-5.5 Days After a Test Tripled Its Hallucination Rate

01OpenAI Hands Health Questions to GPT-5.5, Days After a Test Pegged Its Hallucination Rate at Triple an Open-Weight Rival

ChatGPT now answers health and wellness questions through GPT-5.5 Instant, the variant OpenAI says it rebuilt with stronger reasoning, better context handling, and clearer communication. The company says it ran physician-informed evaluations to sharpen those responses. It is pushing a flagship model into a domain where a confident wrong answer carries real consequences.

The rollout collides with an independent post making the opposite case about reliability. A piece on arrowtsx.dev, titled "Bigger models are not the way," claims GPT-5.5 hallucinates three times as often as GLM-5.2, the MIT-licensed model from Z.ai. The author argues that scaling parameters and training data has stopped buying intelligence. GLM-5.2 runs 753 billion parameters with roughly 40 billion active, and the post says it lands within four points of GPT-5.5 on the Artificial Analysis Intelligence Index, despite GPT-5.5 being estimated at one to two trillion parameters.

The post's core argument is about what large factual training sets teach a model. According to the author, a model fed high volumes of factual, non-theoretical data learns to always produce an answer rather than admit uncertainty. The post cites DeepSeek V4 Pro, which it says scored a 94% hallucination rate on the AA-Omniscience benchmark, meaning it said "I don't know" only about 6% of the time on questions it could not solve. That, the author claims, is the failure mode of chasing size.

The two narratives do not point at exactly the same target. OpenAI's health blog describes GPT-5.5 Instant; the hallucination claim is leveled at GPT-5.5. OpenAI's evidence is self-reported and physician-informed, with no published hallucination figure for the health-tuned variant. The independent test offers a comparative ratio against an open-weight model but no absolute rate for either.

That leaves anyone building health features on ChatGPT weighing a vendor's internal evaluation against an outside claim that the underlying family invents answers it should refuse. The post frames open weights as the safer bet, citing GLM-5.2's near-parity score under an MIT license. OpenAI has not responded to the hallucination comparison. Its next move, whether it publishes a reliability number for the health variant, will tell deployers how much the physician-informed claim is worth.

Health-tuned variant is self-reported, no published hallucination rateconfident wrong answers in medical contexts carry clinical riskMIT-licensed GLM-5.2 claims near-parity intelligence with fewer fabrications.

Sources

Improving health intelligence in ChatGPTopenai.com GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2arrowtsx.dev

02They cited Seattle's anti-retaliation law before testifying. A week later, Amazon moved to discipline them.

Three Amazon software engineers opened their testimony before the Seattle City Council this month by reading a city law into the record. The statute bars employers from punishing workers over political speech. They were arguing in favor of limits on data center construction, and they wanted protection on paper before they said a word.

It did not hold. On June 10th, one week after the hearing, the engineers say Amazon began disciplinary action against them, according to The Verge. They now accuse the company of retaliating in violation of the same ordinance they had quoted from the witness chair. The moves, they say, could end in their firing.

The dispute sits at the base of the buildout powering the AI boom. Data centers draw electricity and water at industrial scale, and Seattle's council is weighing restrictions on new sites. The three engineers, who oppose their employer's infrastructure expansion, used a public hearing to say so. Their accusation is that the company answered with discipline.

That fight is one front. Washington is opening another. Senator Bernie Sanders unveiled a $7 trillion plan he says would hand Americans control of the AI industry, built as a public wealth fund. Sanders frames it as a way for ordinary people to own a stake in the sector instead of watching its gains concentrate among a handful of firms.

His office expects those firms to resist. The plan's headline figure runs into the same capital the largest companies are now committing to compute and physical buildout, the infrastructure Amazon's engineers testified against blocks away from the company's headquarters.

Both pressures point at the same target from opposite directions. One comes from inside the workforce, citing a local labor statute. The other comes from the Senate, citing national ownership. Neither has resolved. Seattle's council has not yet voted on the construction limits, and the engineers' retaliation claim has not been tested. Whether the ordinance shields workers who speak against their own employer's data centers will be decided in that proceeding.

Tests whether local political-speech laws protect tech workers who oppose employer projectsdata center moratorium would slow AI compute buildoutSanders fund signals federal push to dilute Big AI ownership

Sources

Amazon employees say they're facing termination for backing data center limitstheverge.com Bernie Sanders unveils $7 trillion plan to give Americans control of AI industryarstechnica.com

03Robots that practice before they're given a job: three papers push embodied AI off the leash

Within days of each other, three robotics papers landed on Hugging Face, each attacking a different limit of embodied AI. Read together, they sketch one shift: machines that stop waiting for instructions and start modeling the physical world on their own.

Start with the hands. DragMesh-2 takes on articulated objects like drawers, hinges, and handles, where a multi-finger hand cannot directly actuate the target part. Its motion has to emerge from sustained contact between hand and handle. The paper frames this as the hard jump from generating an object's motion to driving it through physically plausible grasps. That kind of compliant contact, the authors argue, is beyond a parallel-jaw gripper. Their named payoff is household, assistive, and humanoid manipulation.

Skill acquisition is the second front. Today's agentic robots can write Code-as-Policy programs, watch the result, and revise across attempts, but they learn only once an instruction arrives. Playful Agentic Robot Learning inverts that order. Its "Robotics Agent Teams" treat self-directed play as a continual learning stage before any downstream task, banking reusable skills ahead of demand rather than after a command.

The third paper goes after perception. S-Agent contends that vision-language models still reason from isolated, stateless frames, while real spatial intelligence means tracking a continuous, evolving 3D scene. It recasts spatial reasoning as evidence accumulated across multi-view images and video over time. The agent calls spatial tools to build and update its model of the room rather than guess from one snapshot.

Grasping, learning, seeing: three problems, one common move away from the task-driven, static-inference robot. Each points toward a system that is autonomous, embodied, and constantly updating its picture of physical space. That direction already carries commercial weight, with world-model startups raising on the bet that machines need internal physics before they earn their keep.

A caution sits under the upvotes. These are research signals, not shipping products. None of the three reports a deployed system, and their boldest claims are the ones that tend to break outside the lab: physical plausibility, transferable play, and persistent 3D memory.

First target is household and assistive robots, not factory armsplay-based learning moves skill collection before the task, not afterall three are research signals, none a deployed product

Sources

DragMesh-2: Physically Plausible Dexterous Hand-Object Interaction with Articulated Objectshuggingface.co Playful Agentic Robot Learninghuggingface.co S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligencehuggingface.co

John Jumper leaves Google DeepMind for Anthropic Nobel laureate John Jumper, who led the AlphaFold protein-structure work, is joining Anthropic. He departs alongside other senior researchers exiting Google DeepMind. techcrunch.com

DeepSeek adds vision to its chat assistant DeepSeek shipped image-input support in its consumer chat product, letting users submit photos and screenshots for analysis. The feature went live directly in the web app. chat.deepseek.com

The Atlantic publishes searchable database of music used to train AI Reporter Alex Reisner identified four datasets of songs used as AI training data and made them publicly searchable. Two datasets hold 12 million and 9 million tracks; the other two exceed hundreds of thousands each. theverge.com

Norway bars AI tools from elementary schools Norway imposed a near-total ban on AI use in elementary education. The rule restricts both classroom tools and student access during early schooling. reuters.com

Salesforce runs an internal AI-adoption leaderboard with public shaming Salesforce ranks teams by AI tool usage, sorted by executive, awarding badges and trophies. The board includes a "click to see who 👀" feature that names employees who have not earned badges. 404media.co

Signal's Meredith Whittaker warns against treating chatbots as companions Whittaker said AI chatbots "are not your friends" and are neither conscious nor sentient. She pushed back on product framing that positions assistants as relationships. techcrunch.com

NVIDIA partners pitch autonomous AI marketing at Cannes Lions NVIDIA and ad-industry partners showed systems that run campaign operations with less human input. The push targets agencies that already use AI for production speed. blogs.nvidia.com

Allbirds founder starts an AI company with a plan and no staff The CEO of Allbirds' new AI venture raised a large seed round but has not yet hired a team. The company's product direction remains undefined. techcrunch.com

Karamo Brown launches wellness app with an AI clone of himself The "Queer Eye" coach released Kē, a wellness app built around an AI version of him. It covers fitness, nutrition, meditation, sobriety, and relationships. techcrunch.com

Google Docs users can disable Gemini writing prompts TechCrunch detailed the settings path to remove "write with Gemini" pop-ups in Google Docs. The steps turn off in-document AI suggestions. techcrunch.com