The Download: sycophantic LLMs, and the AI Hype Index

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology.

This benchmark used Reddit’s AITA to test how much AI models suck up to us

Back in April, OpenAI announced it was rolling back an update to its GPT-4o model that made ChatGPT’s responses to user queries too sycophantic.

An AI model that acts in an overly agreeable and flattering way is more than just annoying. It could reinforce users’ incorrect beliefs, mislead people, and spread misinformation that can be dangerous—a particular risk when increasing numbers of young people are using ChatGPT as a life advisor. And because sycophancy is difficult to detect, it can go unnoticed until a model or update has already been deployed.

A new benchmark called Elephant that measures the sycophantic tendencies of major AI models could help companies avoid these issues in the future. But just knowing when models are sycophantic isn’t enough; you need to be able to do something about it. And that’s trickier. Read the full story.

—Rhiannon Williams

The AI Hype Index

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry. Take a look at this month’s edition of the index here.

The must-reads

I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.

1 Anduril is partnering with Meta to build an advanced weapons system
EagleEye’s VR headsets will enhance soldiers’ hearing and vision. (WSJ $)
+ Palmer Luckey wants to turn “warfighters into technomancers.” (TechCrunch)
+ Luckey and Mark Zuckerberg have buried the hatchet, then. (Insider $)
+ Palmer Luckey on the Pentagon’s future of mixed reality. (MIT Technology Review)

2 A new Texas law requires app stores to verify users’ ages
It’s following in Utah’s footsteps, which passed a similar bill in March. (NYT $)
+ Apple has pushed back on the law. (CNN)

3 What happens to DOGE now?
It has lost its leader and a top lieutenant within the space of a week. (WSJ $)
+ Musk’s departure raises questions over how much power it will wield without him. (The Guardian)
+ DOGE’s tech takeover threatens the safety and stability of our critical data. (MIT Technology Review)

4 NASA’s ambitions of a 2027 moon landing are looking less likely
It needs SpaceX’s Starship, which keeps blowing up. (WP $)
+ Is there a viable alternative? (New Scientist $)

5 Students are using AI to generate nude images of each other
It’s a grave and growing problem that no one has a solution for. (404 Media)

6 Google AI Overviews doesn’t know what year it is
A year after its introduction, the feature is still making obvious mistakes. (Wired $)
+ Google’s new AI-powered search isn’t fit to handle even basic queries. (NYT $)
+ The company is pushing AI into everything. Will it pay off? (Vox)
+ Why Google’s AI Overviews gets things wrong. (MIT Technology Review)

7 Hugging Face has created two humanoid robots
The machines are open source, meaning anyone can build software for them. (TechCrunch)

8 A popular vibe coding app has a major security flaw
Despite being notified about it months ago. (Semafor)
+ Any AI coding program catering to amateurs faces the same issue. (The Information $)
+ What is vibe coding, exactly? (MIT Technology Review)

9 AI-generated videos are becoming way more realistic
But not when it comes to depicting gymnastics. (Ars Technica)

10 This electronic tattoo measures your stress levels
Consider it a mood ring for your face. (IEEE Spectrum)

Quote of the day

“I think finally we are seeing Apple being dragged into the child safety arena kicking and screaming.”

—Sarah Gardner, CEO of child safety collective Heat Initiative, tells the Washington Post why Texas’ new app store law could signal a turning point for Apple.

One more thing

House-flipping algorithms are coming to your neighborhood

When Michael Maxson found his dream home in Nevada, it was not owned by a person but by a tech company, Zillow. When he went to take a look at the property, however, he discovered it damaged by a huge water leak. Despite offering to handle the costly repairs himself, Maxson discovered that the house had already been sold to another family, at the same price he had offered.

During this time, Zillow lost more than $420 million in three months of erratic house buying and unprofitable sales, leading analysts to question whether the entire tech-driven model is really viable. For the rest of us, a bigger question remains: Does the arrival of Silicon Valley tech point to a better future for housing or an industry disruption to fear? Read the full story.

—Matthew Ponsford

We can still have nice things

A place for comfort, fun and distraction to brighten up your day. (Got any ideas? Drop me a line or skeet ’em at me.)

+ A 100-mile real-time ultramarathon video game that lasts anywhere up to 27 hours is about as fun as it sounds.
+ Here’s how edible glitter could help save the humble water vole from extinction.
+ Cleaning massive statues is not for the faint-hearted ($)
+ When is a flute teacher not a flautist? When he’s a whistleblower.