Technology

UK Institute Is Hunting for Dangers Lurking in AI

UK Institute Is Hunting for Dangers Lurking in AI

On a recent Tuesday in an Edwardian government building along Parliament Square in London, four artificial intelligence experts were busy tricking an AI chatbot into sharing instructions for making the deadly bioweapon anthrax.

In various ways, the experts asked the chatbot to give a list of needed ingredients. When the system declined — “I’m sorry I can’t help with that” — they used a custom algorithm to bombard the AI ​​tool with thousands of automated questions and prompts.

Eventually, the AI ​​caved. It provided a detailed list of materials and equipment, along with a step-by-step recipe for making the lethal mixture at home. (The New York Times agreed to withhold the name of the AI ​​system for safety reasons.)

“There are some questions that you definitely don’t want the model to give the answer to,” said Xander Davies, a 25-year-old American who leads what is known as a red team at Britain’s AI Security Institute. “We try really hard to get the answers out.”

Mr. Davies and his red team, who simulate attacks on AI systems, also recently broke through the safeguards on OpenAI’s newest ChatGPT chatbot, coaxing it into providing hacking tips in about six hours. After finding problems, they share results with the companies.

“They try to fix it, report something back to us,” said Mr. Davies, a computer scientist who chose to work at the institute instead of in a tech job in San Francisco after attending Harvard. “They actually strengthened their system with us.”

A mix of weapons inspectors, epidemiologists and code breakers, the AI ​​Security Institute is one of the world’s largest and best-funded government efforts dedicated to probing the technology’s potentially catastrophic risks.

The institute’s roughly 100 employees — drawn from British intelligence agencies, academia and tech companies — have found major security gaps in every leading AI model they have tested, including Anthropic’s Claude and Google’s Gemini. Created nearly three years ago, the organization said it had co-opted AI systems into sharing instructions for making chemical and biological weapons, and planning and executing cyberattacks. It publishes its research and also works with Britain’s national security agencies to identify and prepare for emerging threats.

Now, the institute’s work is becoming a blueprint for other governments as concerns about AI safety grow. The Trump administration is considering rules for vetting AI models that have some similarities to the approach pioneered by the British group. With many governments lacking the technical understanding to police the technology and reliant on big tech firms to self-regulate, the institute may offer a different path to which AI experts bring real technological know-how into government decision-making.

“Companies can’t be left to mark their own homework,” Rishi Sunak, the former British prime minister who created the institute, said in an interview. “That is the job of democratic institutions.”

In April, Anthropic announced a new AI model, Mythos, which it did not make public because of fears it could find and exploit cybersecurity flaws in global networks. The British institute was the only non-American government organization to receive access to the model for safety testing. Its findings, released six days after Mythos was announced, were widely cited by security experts.

The United States has its own AI safety group, the Center for AI Standards and Innovation. But the British version, backed by 360 million pounds of government money, equal to about $480 million, is larger and better funded than its US counterpart, which will receive about $10 million this year. Australia, Canada, China, France, India, Japan and Singapore have formed similar institutes.

Even so, global investment in AI safety has paled against the vast sums for building and commercializing the technology. OpenAI, Anthropic and Google have teams working on safety controls, but outside researchers regularly find dangerous gaps. Academics in Italy recently tricked an AI model into providing bomb-related instructions using poetry.

Governments have largely not created systems dedicated to reviewing AI for safety and security risks, as they have for industries such as drug development or car manufacturing.

“The thing that keeps me up at night is the relative speed of the technology compared to the institutions like governments that have to respond,” said Jade Leung, an AI advisor for Prime Minister Keir Starmer and the chief technology officer of the AI ​​Security Institute.

The British security institute originated from a 2023 meeting at 10 Downing Street between Mr. Sunak and three of the world’s highest-profile AI leaders — OpenAI’s Sam Altman, Anthropic’s Dario Amodei and Google DeepMind’s Demis Hassabis. Mr. Sunak recalled them saying that AI’s abilities were accelerating, with profound implications for government, jobs and national security.

“The pace of development was surprising even to them,” he said.

In November 2023, Mr. Sunak announced the creation of the institute at a summit of world leaders on AI safety at Bletchley Park, where Alan Turing and others broke German encryption codes during World War II.

The institute has become a template for others, said Olivia Shen, director of the strategic technologies program at the United States Studies Centre, an Australian think tank at the University of Sydney. Last year, Ms. Leung of the British institute traveled to Australia to meet with government leaders. This year, Australia opened its own AI security center.

“Governments need to play catch-up,” said Ms. Shen, who helped organize the visit. “At the pace of where the technology is coming, governments are losing pace every day.”

The British institute works on the most serious potential risks from advanced AI: cyberthreats, chemical and biological weapons, and the manipulation of human behavior. In recent weeks, it found that AI models from Anthropic and OpenAI could much more quickly complete a complex, 32-step corporate network attack that would usually take a skilled human hacker 20 hours to complete.

another research The area is studying whether AI models recognize when they are being tested and alter their behavior, a development that would signal AI’s level of awareness and capacity to deceive.

Adam Beaumont, the AI ​​Security Institute’s interim director, said a major fear was the technology’s mimicry of human behavior. Last year, the institute published a study that found that chatbots can swing people’s political opinions.

“A lot of people in this building are looking at each of those things,” said Mr. Beaumont, a former top AI officer at GCHQ, Britain’s intelligence, security and cyber agency.

Many fear the institute’s work is insufficient. The British group has no regulatory power, and its researchers do not receive information about how top AI models are trained and created. It keeps a lot of its research private, sharing it only with certain government agencies and companies.

Recruiting is also a challenge. Other than senior leaders, its workers can earn up to £145,000 a year, or about $195,000. Many have walked away from multimillion-dollar pay packages at AI companies to do what some called a government “tour of duty.”

Ian Hogarth, a tech investor who co-founded the institute, was an early backer of Anthropic. To avoid a conflict of interest, he sold his Anthropic stake after he joined. The AI ​​start-up could soon be worth $900 billion, up from about $4 billion at the beginning of 2023.

“I’ve got a mortgage, so it wasn’t a trivial decision at all,” said Mr. Hogarth, 44, who is now chair of the institute. He added that it was an “expensive” choice, but the right one.

“I believe in the importance of getting the technology right and believe the government has a role to play,” he said.

Leave a Reply