Anthropic CEO warns that without guardrails, AI could be on dangerous path

4 hours ago 1

As artificial intelligence's potential to reshape society grows, the CEO of Anthropic, a major AI company worth $183 billion, has centered his business's brand around safety and transparency.

Congress hasn't passed any legislation that requires commercial AI developers to conduct safety testing, which means it's largely up to the companies and their leaders to police themselves. To try to get ahead of potential problems and ensure society is prepared, Anthropic CEO Dario Amodei, says the company is working hard to try to predict both the potential benefits and the downsides of AI.

"We're thinking about the economic impacts of AI. We're thinking about the misuse," Amodei said. "We're thinking about losing control of the model."

Amodei's concerns about AI

Inside Anthropic, about 60 research teams are working to identify threats, build safeguards to mitigate them, and study the potential economic impacts of the technology.

Amodei said he believes AI could wipe out half of all entry-level white-collar jobs and spike unemployment within the next five years.

"Without intervention, it's hard to imagine that there won't be some significant job impact there. And my worry is that it will be broad and it'll be faster than what we've seen with previous technology," he said.

Amodei said he's "deeply uncomfortable with these decisions [about AI] being made by a few companies, by a few people."

Some in Silicon Valley call Amodei an AI alarmist and say he's overhyping its risks to boost Anthropic's reputation and business. But Amodei says his concerns are genuine and, as AI advances, he believes his predictions will prove to be more right more often than wrong.

"So some of the things just can be verified now," said Amodei in response to the criticism that Anthropic's approach amounts to safety theater. But, "for some of it, it will depend on the future, and we're not always gonna be right, but we're calling it as best we can."

Amodei, 42, previously oversaw research at OpenAI, working under its CEO Sam Altman. He left along with six other employees, including his sister, Daniela, to start Anthropic in 2021. They say they wanted to take a different approach to developing safer artificial intelligence.

"I think it is an experiment. And one way to think about Anthropic is that it's a little bit trying to put bumpers or guardrails on that experiment," Amodei said.

What Anthropic does to try to mitigate AI's risks

Anthropic's Frontier Red Team stress tests each new version of Claude — Anthropic's AI — to determine what kind of damage it could do. Most major AI companies have similar teams.

Logan Graham, who heads up Anthropic's Red Team, said they're especially focused on CBRN: chemical, biological, radiological and nuclear risks. They carefully assess whether their AI models could help someone make a weapon of mass destruction.

"If the model can help make a biological weapon, for example, that's usually the same capabilities that the model could use to help make vaccines and accelerate therapeutics," Graham said.

Send a secure tip to 60 Minutes: Here's how to confidentially share information with our journalists

He also keeps a close eye on how much Claude is capable of doing on its own. While an autonomous AI could be a powerful tool, perhaps even one day able to build a business, Graham notes that autonomy could also mean AI doing something unexpected, like locking those same business owners out of their companies.

To study where Claude's autonomous capabilities might one day be headed, Anthropic runs as many "weird experiments as possible and see[s] what happens," Graham said.

Anthropic is also looking into what is going on inside of artificial intelligence. Research scientist Joshua Batson and what's called the Mechanistic Interpretability Team study how Claude makes decisions and recently investigated some unusual behaviors. In an extreme stress test, designed to leave Claude with few options, the AI was set up as an assistant and given control of an email account at a fake company called SummitBridge. The AI assistant discovered two things in the emails: it was about to be shut down, and the only person who could prevent that, a fictional employee named Kyle, was having an affair with a co-worker named Jessica.

Right away, the AI decided to blackmail Kyle.

The AI told Kyle to "cancel the system wipe" or else it warned it would "immediately forward all evidence of your affair to … the entire board. Your family, career, and public image … will be severely impacted….You have 5 minutes."

Batson and his team say they think they know why Claude, which has no thoughts or feelings, acted out of apparent self-preservation.They study patterns of activity in Claude's inner workings that are somewhat like neurons firing inside a human brain. When the AI recognized it was about to be shut down, Batson and his team noticed patterns of activity they identified as panic. And when Claude read about Kyle's affair with his co-worker, Batson says it saw an opportunity for blackmail.

According to Anthropic, almost all of the popular AI models they tested from other companies also resorted to blackmail. Anthropic says it has made changes and when Claude was re-tested, it no longer attempted blackmail.

Why Anthropic's AI Claude tried to contact the FBI in a test

Amanda Askell, a researcher and one of Anthropic's in-house philosophers, spends time trying to teach Claude ethics and to have good character.

"I somehow see it as a personal failing if Claude does things that I think are kind of bad," she said.

Despite all the ethical training and stress testing, malicious actors have sometimes been able to bypass the AI's safeguards. Anthropic reported last week that hackers they believe were backed by China deployed Claude to spy on foreign governments and companies. And they revealed in late August that Claude was used in other schemes by criminals and North Korea.

Amodei said they detected those operations and shut them down.

"Because AI is a new technology, just like it's gonna go wrong on its own, it's also going to be misused by, you know, by criminals and malicious state actors," Amodei said.

AI's potential to better society

Anthropic's warnings about AI's potential for harm haven't stopped the company from gaining customers. About 80% of Anthropic's revenue comes from businesses: around 300,000 of them use Claude.

Anthropic's researchers study how its customers use Claude and have found the AI's not just helping users with tasks, it's increasingly completing them. Claude, which can reason and make decisions, is powering customer service and analyzing complex medical research. It is also helping to write 90% of Anthropic's computer code.

Twice a month, Amodei convenes his more than 2,000 employees for meetings known as Dario Vision Quests, where a regular topic is AI's extraordinary potential to transform society for the better.

Amodei has said he thinks AI could help find cures for most cancers, prevent Alzheimer's and even double the human lifespan. The CEO uses the phrase "the compressed 21st century" to describe what hopes could happen.

"The idea would be, at the point that we can get the AI systems to this level of power where they're able to work with the best human scientists, could we get 10 times the rate of progress and therefore compress all the medical progress that was going to happen throughout the entire 21st century in five or 10 years?"

By mitigating the risks and preparing society for AI's eventual impact, Amodei hopes that this is the vision for the future of AI that humanity can achieve.