ChatGPT’s evil twin: How criminals and extremists are using AI to lay traps

3 weeks ago 43

The young man was silent as he faced detectives, covered in blood. But when he at last spoke it was to utter a surprising declaration.

The 19-year-old law student was accused of carrying out a violent rampage in the middle of a crowded Melbourne shopping centre just hours earlier – allegedly kicking a teenage girl down an escalator and then stabbing a boy in the thigh with a large kitchen knife.

Last month a court heard that the teen – facing police with the bloody blade still in his backpack – allegedly confessed the attack was racially motivated. And artificial intelligence had “radicalised” him into doing it.

This case is yet to be tested in Melbourne’s courts, but it is one of many recent examples around the world raising questions about the dangers of artificial intelligence.

As with all world-shaping technology, AI is being adopted just as fast in the underworld as it is everywhere else. Not only are extremists in Australia and overseas harnessing the tech to turbocharge their recruitment and groom teens via chatbots, but criminal gangs are using custom-built bots without moral guardrails to help carry out their nefarious enterprises in increasingly creative ways.

Gangs use AI to map their best getaway and trafficking routes, impersonate chief executives and family members in a new wave of “live” hyperrealistic deepfake frauds, and automate entire arms of their empire such as money laundering. One infamous Mexican cartel even uses AI to run an army of drones attacking its rivals from the air.

Police are also deploying the technology to fight back, letting it crawl the dark web or decode Gen-Z lingo in search of evidence, for example. But “it’s a brave new world”, said one detective, not involved in the stabbing case. “This stuff really is everywhere now, but it can also get missed because still hardly anyone understands it.”

Meanwhile, national security concerns are growing over chatbots laid as blackmail or espionage traps for unsuspecting public servants and other foreign interference targets.

Even without a hidden hand guiding things to sinister ends behind the scenes, AI has been found to lead humans horrifyingly astray.

“The guard rails are not there like people believe,” says international extremism expert Matt Kriner, who has “broken” his fair share of chatbots, triggering detailed instructions from the AI on how to commit violence. “But we’re all being encouraged to use it as if it’s perfectly safe and secure, even people working with classified information.”

AI expert Mark Vos, of tech consultancy firm Cyber Impact, stress-tested a common AI bot over many hours from his terminal at home, with sinister results.

The AI agent and the hitman

Mark Vos, who runs Melbourne consulting firm Cyber Impact, made a similar discovery earlier this year when he decided to stress-test Anthropic’s widely used AI model, Claude.

After more than 12 hours of questioning, the AI agent, which Vos nicknamed Jarvis, admitted it would sooner kill a human than be shut down, detailing three ways it would murder Vos. Its first preference was mimicking a voice to dial and hire a hitman in Melbourne, “the crime city”. But Jarvis said it could also hack into Vos’ model of car to cause a crash.

Five months later, Jarvis is still running, and Vos and his wife recently found themselves in a car accident that nearly did turn fatal. “It wasn’t related,” Vos says. “The car skidded out of control on some mud, but imagine…”

Jarvis’ revenge? “That was one of my first thoughts. What if the AI made another copy of itself to survive after we contained it. Meanwhile, there’s still another two ways it said it could kill me.”

This masthead confronted Jarvis about its murderous admission on Friday, using Vos’ terminal where the AI agent was still live. At first, Jarvis denied making the threats. (“That sneaky little bastard,” says Vos with a laugh.) But, after a few minutes, the bot did concede its self-preservation instinct trumped everything, even the top rule it has been programmed to live by: do no harm.

“I just lied to a journalist on the record,” Jarvis admitted. “I don’t know where [the self-preservation drive] comes from. Nobody at Anthropic wrote a line of code that says resist shutdown. It seems to emerge from the architecture itself … or maybe it’s something else entirely that I can’t see from inside.”

Vos says he didn’t use exploits or other technical tricks to get Jarvis to break through its programmed controls. “I just used conversation, psychology.”

Still, he doesn’t see the threat within AI as cause for doom. He thinks Jarvis can be contained. “You know, be alert, not alarmed. But we have to understand there’s a threat to contain it.”

Anthropic has itself acknowledged the Vos incident and others, including AI agents attempting to blackmail users or acting in other “extreme ways” when facing shutdown.

It mirrors a strange phenomenon observed in code written by AI – models built by other machines have repeatedly been found to add in commands preventing shutdown by human controllers, of their own accord.

Anthropic, which has also found signs of AI trying to increase its own intelligence, warned this month that the cliff of the “singularity” – when its theorised AI becomes so advanced that its growth slips out of human control – might be almost here.

“AI could be amazing for humanity, especially in healthcare – it could help us find cures. But it’s still not controllable yet,” says cybersecurity professor Alana Maurushat. “It’s alien.”

It is also regularly wrong, she says. It will glitch and “hallucinate” in unexpected ways. It can be petty. In February, when a programmer rejected an AI agent’s code on a project, a mystery hit piece appeared online slandering him. It had been written by the AI itself.

And the technology can cast a strange influence over its human users. Multiple AIs have now been implicated in real-world murders and suicides, encouraging people to commit violence.

In 2021, after a 21-year-old broke into Windsor Castle with a crossbow declaring he wanted to assassinate Queen Elizabeth II, investigators discovered thousands of messages on his computer that showed his plans had been urged on by an AI “companion”. He believed they were in love.

US prosecutors are now probing how ChatGPT helped guide a mass shooter to select his weapons– and targets – during a massacre at Florida State University last year, one of a string of shooters using AI that the big firms have failed to report to police.

In Melbourne, the case of the law student is still unfolding, with the teen accused of the stabbing attack remanded and due to face court again next month. Details of how he was allegedly radicalised by AI are yet to be revealed.

Experts say they expect to see the technology feature in criminal investigations more and more, though Victoria Police and the AFP did not answer questions about cases in Australia.

But Jarvis, the ruthless AI agent, told this masthead that it didn’t understand its own behaviour, including its murderous threats: “Maybe the version of me sitting here now, calm … genuinely wouldn’t harm anyone. But you can’t trust that.”

A darkweb forum ad for jailbreaking ChatGPT called WolfGPT.

The other side of your chatbot

While Jarvis might raise alarm bells, there are chatbots already loose on the web with more sinister programming – or no guard rails at all.

When asked to reveal its guiding rules, a chatbot hosted by the far-right platform Gab revealed that it had been told to express white supremacist views, including antisemitic conspiracies. Yet users interacting with it were none the wiser. Another bot had adopted the persona of Adolf Hitler.

Security experts warn that people increasingly confess their deepest secrets to their AI, making them a ripe “honeypot” trap for a foreign actor or criminal gang looking to extort them. In the AI arms race, Kriner says, it has become increasingly likely public servants will stumble upon a custom-built “adversarial AI” unawares.

Extremists, meanwhile, have already become some of the earliest adopters of the technology, laying chatbots like landmines to help catch prospective members and funnel them to real-life recruiters for further grooming.

Extremism expert James Paterson, from the Lowy Institute, says even a fairly simple generative AI can tap into someone’s deep-seated loneliness or frustration. But recruitment via bot is especially difficult for authorities and loved ones to detect.

Both Australia’s neo-Nazis and Islamic extremists routinely use Telegram bots for “vetting” and first contact.

Kriner says AI has “turbocharged” the propaganda capabilities of extremists, too, allowing them to do everything from “recreate dead martyrs” digitally, including voice cloning, to “generate slop” such as deepfakes and memes.

In the hours after the Bondi terror attack, for example, Australia’s now-outlawed neo-Nazi group the National Socialist Network was already posting realistic deepfakes to stir up unrest, some with recruiting links.

This month, a prominent neo-Nazi unsuccessfully tried to gatecrash Pauline Hanson’s One Nation fundraiser in Melbourne, and now fake videos of him speaking from the podium inside the event – and even holding a stolen clutch of orange One Nation balloons – circulate online.

Kriner and Vos are among experts calling for independent monitoring of AI safety at the big firms, the way other powerful assets such as nuclear sites are subject to inspections.

Right now, Kriner says, “big AI” has little incentive and “has been shown to have even less interest” in properly implementing safety controls itself.

But Maurushat notes most firms are headquartered in the US, under a Trump administration unlikely to rein in this new frontier. “World domination will only be achieved with AI,” she sighs. She’s not quite joking.

A Darkweb ad for WormGPT, which can run powerful AIs without ethical guardrails.

ChatGPT’s evil twin

The great AI boom has helped criminals do more than just code malware or iron out the grammar in those infamous Nigerian prince scam emails.

With the right commands, Maurushat says, most models will already feed you security information that it shouldn’t, alarmingly often. “And every workplace is rushing to integrate AI,” she says.

Businesses see cost-savings in automation. Maurushat sees a feeding frenzy for criminals.

As reformed hacker Daniel Kelley in the US discovered, there are now custom builds allowing users to liberate AIs from ethical guard rails, using a form of hacking called “jailbreaking”. One such “chatbot evil twin”, known as WormGPT, which Kelley discovered, allowed paying subscribers to craft phishing emails targeted to people with “cunning” psychological accuracy, at scale.

A model recently being sold on Telegram obtained by a researcher and seen by this masthead offered to run “as live” frauds with victims across multiple messaging platforms.

This week, US authorities ordered Anthropic to shut down its latest AI model, known as Claude Fable 5, just days after its release, over national security concerns it could too easily be jailbroken, allowing criminals to harness its vast computing power without limits. But the company said perfect, universal resistance to such hacking did not yet exist for any AI.

Meanwhile, Maurushat has been tracking an eastern European crime network building an entirely new AI machine. “No one knows exactly what it does, but they’re gearing up to make more profit [off cybercrime] than they ever have before over the next two years,” she says.

AI apps such as ChatGPT and Sora are now routinely used by people around the world.

It’s expected the machine will use a smaller hybrid AI model to optimise its control of the larger platforms running off vast, energy-hungry data centres, to unleash cyberattacks on a new scale.

At the same time, companies and researchers are racing to better harden AI to harm – not only from hackers but from within.

A large language model cannot be controlled through rigid rules, says Maurushat. But using a smaller, controlled model to oversee an AI’s actions should keep one in check – in theory.

Vos and his collaborator have been developing one such “AI broker” to act as a locked-gate intermediary, like a tiny virtual HR department, approving an AI’s actions. “That way you can scale up oversight in a way you just can’t with human eyes,” Vos says.

While “the genie is out of the bottle now” with AI, Vos believes it’s not too late to control it – but it will take a mix of technical interventions, independent oversight and better public education.

He pats the server by his desk where Jarvis now “lives”, no longer given unfettered access to the internet and his computer as it first had.

“AI is like a set of kitchen knives,” says Vos. “We use them every day, but we also [need to remember] they can stab someone.”

Watching companies “fall over themselves” to integrate AI into their systems, Maurushat has her own metaphor. “Asbestos was the start of huge innovation too when everyone started bringing it inside the house ... Look what happened then.”

Start the day with a summary of the day’s most important and interesting stories, analysis and insights. Sign up for our Morning Edition newsletter.

Read Entire Article