#13: The UK's Multilateral AI Safety Institute + Open-sourcing Advanced AI + Biden's Executive Order
Welcome to Navigating AI Risks, where we explore how to govern the risks posed by transformative artificial intelligence.
“We have to revisit the longstanding premise of maintaining “relative” advantages over competitors in certain key technologies. We previously maintained a “sliding scale” approach that said we need to stay only a couple of generations ahead. That is not the strategic environment we are in today.”
Jake Sullivan, United States National Security Advisor
The UK, US, and Other Western Countries are Setting up a Multilateral AI Safety Institute
After a first day spent discussing AI safety and trying to come up with a consensus on extreme risks from AI, the UK’s AI Safety Summit second day will bring together around twenty people, heads of state and CEOs of AI companies, to discuss what to do about those risks. High on the agenda: the creation of a joint initiative to evaluate the national security risks of frontier AI models.
Big ambitions: UK Prime minister Rishi Sunak wants like-minded countries and AI labs to work together. The basic idea is to grant governments access to the latest “frontier” AI models to evaluate their risks. The structure through which this process will happen, the AI Safety Institute, aims to give governments more visibility into advanced AI models by helping them assess their capabilities and risks. The findings from the process may help guide AI policy at the domestic and international levels.
Building on previous efforts: We recently learned that the UK government, through its frontier AI taskforce, a self-described “AI research team that evaluates risk at the frontier of AI” that reports directly to the Prime minister, obtained a commitment by Anthropic, OpenAI, and Google Deepmind that it would gain access to their AI models. Now we’re learning that this will serve a multilateral effort. This was an ambition already admitted by deputy Prime minister Oliver Dowden, who said last month that he wanted the taskforce to “evolve to become a permanent institutional structure, with an international offer on AI safety”. The proposed AI Safety Institute is this international offer.
Why? The AI Safety Institute is the UK’s first step in setting up a multilateral approach to address frontier AI’s emerging risks. To-be-designated entities within partner countries (for the UK, likely the taskforce) will use their privileged access to AI models to evaluate their capabilities and risks to national security (e.g. in terms of cyberwarfare or bioweapons). For now, that’s all we know. Increased government visibility into the state-of-the-art of AI may help devise well-adjusted rules for AI development, or at least help regulators be better prepared to deal with the technology’s risks.
Success factors: The success of the Institute will depend on a wide range of factors. Three will be especially critical. Topping the list is the level of access granted by AI companies to governments. A high level of access to the model, combined with the expertise to understand it, will help governments understand its risks more than a low level of access. Anthropic reportedly discussed sharing model weights (the “parameters” of the model used to produce an output based on a given input), but the discussion seems to have moved on to “delivering the model via API [the same level given to researchers or business customers] and seeing if that can work for both sides”.
A risky endeavor? Giving model weights access to governments would overall make leaks and proliferation more likely. However, Politico reports that “as part of the plan, the U.K. aims to set up a highly secure facility where national security officials can carry out risk assessments of the frontier models”. That might help. But in the end, barring legislative action, the decision regarding the level of access given to governments depends nearly entirely on companies’ good intentions. A problem for regulators: companies probably won’t keep giving you a high level of access if you use it to set rules they don’t like.
Who has access? Another major question regarding the design of the Institute concerns who will benefit from its privileged access to frontier models. The Guardian reports that only “national security agencies” would gain access; if so, we may not see much impact coming from the Institute in terms of raising public awareness about extreme risks. Keeping risk assessment results confidential may also increase the chances of regulatory capture that some are warning about. Another, possibly more beneficial arrangement would be for selected (and vetted) public officials, assisted by technical experts, to evaluate the models for their dangerous capabilities and propensity to use them, and then communicate the results to a broader range of public officials. Such visibility could help leverage the results to design regulations tailored to the frontier of AI, while preventing the proliferation of AI models caused by widespread access.
Who’s the leader? A related question relates to the governance of the Institute. As the UK has taken the leadership role in the organization of the summit and the launch of the Institute, it seems likely that it will have much influence over this hypothetical new body. Then, will it be truly multilateral, or really UK-led? If the Institute is supposed to become anything more than a temporary, small-scale information-sharing arrangement between the national security agencies of a handful of countries, it will have to be governed by a multilateral coalition. But this is bound to create infighting between the participants. Where will the Institute be headquartered? The UK would seem like the obvious answer, but are we sure that other countries will agree? Who will fund the initiative? What will be the nationality of its staff? Will it be governed by a board of member state representatives, or will it remain an informal initiative? Many questions remain.
Laying the groundwork for the future of AI risk mitigation: This is the first step toward an international information-sharing regime between leading AI labs and governments. We can imagine that if the Institute is seen as successful, and as it builds up its capacity, its mandate could be expanded. For example, to fulfill the desire of some for an international authority that can inspect and, if deemed safe, certify AI models or the labs that develop them. Such an authority could be used for ensuring AI development is made safely; it could also help governments make decisions regarding, e.g., allowing access to such models to geopolitical competitors and their companies (what the US calls “entities of concern”). Indeed, it is being reported that the US is considering a ban on the “export” of frontier models; such a decision would be more effective and legitimate if it came from several countries, based on a scientific risk assessment of the model. The Institute could also help countries harmonize future licensing decisions (e.g. an AI lab or development project would receive a license if it passes the Safety Institute’s audit). Of course, it goes without saying that the Institute’s future activities are entirely unknown at the moment. Indeed, seeing the challenges that lie ahead, whether it even will be created remains a question.
Know Your Customer, Public Procurement, and New Safety Standards: Biden’s Executive Order on AI
Last July, President Biden announced his office was working on an executive order, with a view to establish a framework to mitigate the risks associated with AI.
New moves: The debate over regulating artificial intelligence (AI) in the United States is slowly maturing. Although senators and members of Congress continue their prolonged discussions about several proposed AI laws, the White House is taking the spotlight in US AI policy. The upcoming executive order will set public procurement standards for AI; in short, the federal government and its agencies will only be able to purchase AI tools and services from companies that respect certain rules.
The power of procurement: The immense purchasing power of the federal government and growing AI expenditures could incentivize AI companies to adopt AI safety and other standards. Because an executive order is a legally binding tool used only to manage the operations of the federal government, it can't directly mandate the private sector to abide by certain rules. But it can tell government entities what they can and cannot buy. The Biden administration seeks to capitalize on this opportunity to set industry standards, without the need to go through Congress. Ultimately, the helpfulness of the executive order in mitigating AI risks will hinge on the precise safety standards used (experts have their ideas on what would be helpful).
New standards: The administration will request the National Institute of Standards and Technology, who authored the influential AI Risk Management framework, to develop new guidelines for testing and evaluating AI systems. Such a partnership would build on the voluntary commitments of fifteen AI companies secured by the Biden administration this year. There are also reports that the order may contain a classified annex, focusing on AI applications in national security.
Do you know your customer? The executive order is also expected to require cloud computing firms to monitor and track the activities of users developing cutting-edge AI systems. The U.S. Commerce Department would be mandated to implement regulations that require firms like Microsoft, Google, and Amazon to notify the government when a customer buys AI chips beyond a certain threshold, using "Know Your Customer" (KYC) policies prevalent in industries like banking or cybersecurity (that we analyzed in NAIR #11).
Before november: Congressional Democrats recently asked the White House to include the principles outlined in the “AI Bill of Rights” in the executive order. The order is yet to be finalized, so its requirements are still subject to change, especially since the administration wants it to be a comprehensive set of rules. It is anticipated to be made public by the end of October.
Deep Dive: Risks and Benefits of Open-sourcing Frontier AI Models
There is a rising tide of apprehension surrounding the increased capability of AI models and the irreversible consequences of making these models publicly accessible. Just recently, this sentiment sparked a protest at Meta’s offices against its open-access AI models, (more specifically the LLAMA 2 model). In this context, it is important to understand what open-sourcing an AI entails, the risks involved, and potential alternatives for maintaining safety.
Note that open-sourcing AI refers to making the structure (architecture) and learned parameters (weights) of an AI model publicly available. This contrasts with API-based access, where anyone can send queries to the AI model, but it operates on the company’s servers.
The biggest risk posed by open-sourcing is misuse by malicious actors. Publicly available foundation models can be used by anyone, anywhere in the world, which could drastically increase the level of risk to society. In a recent US Senate hearing, Anthropic’s CEO warned that, within a few years, large language models could be used to create bioweapons. Other potential misuse areas include surveillance, social control, scamming, spear phishing, and cyberattacks.
Open-sourced models intensify misuse risks in multiple ways:
1. Open-sourced releases are irreversible: Once an AI system is open-sourced, it can’t be taken back, any resultant risks are irreversible. Combined with the fact that additional scaffolding and tools can be later added on top of a model to increase its capabilities (e.g. autonomous agents), this makes open-sourcing a decision to be taken with care. As we've seen with previous foundation models, discovering new risks over time is the norm. On the other hand, the risk factor is more controllable with API access. The system can be programmed to deny potentially harmful queries, and in extreme circumstances, the entire API access can be shut down if needed.
2. Fine-tuning: It is currently impossible to make an open-sourced model safe. Even a safely released foundation model can be fine-tuned with as little as a few hundred Euros in cloud computing costs to undo any safety safeguards. However, fine-tuning via an API is monitorable as the API owner can inspect fine-tuning data sets.
3. Jailbreaking: This tactic involves adding specific words or sequences of characters that can disable model safeguards. As previously outlined in our newsletter (#NAIR 10), open-source models make finding jailbreaks easier and these attacks generalize to other AI models. Therefore, such jailbreaks can prove hazardous for the misuse of API-based access models too.
Open-sourcing has benefits, but are they achievable without open-sourcing?
Public access enables independent model evaluation by the wider AI community, uncovering unnoticed bugs, biases, and safety issues for better-performing, safer AI products. It also allows researchers to perform safety research on the systems. The question thus becomes: can those benefits be collected without necessarily open-sourcing the model? Some measures can help. For instance, model access could be granted only to independently vetted third-party auditors and red team professionals. This selective access could be achieved via a gated download process or through a research API.
Open-sourcing helps reducing power centralization, distributing influence over AI direction among more diverse interests and needs. This is a massive concern and open source is a way to help resolve this issue. Another way to tackle unequal power over the future of AI would see its development led by a wide variety of stakeholders and countries. A recent TIME article proposes to make the most risky form of AI developed by an international multi-stakeholder consortium called MAGIC, i.e. the Multilateral AGI Consortium. Global governance and robust whistleblower mechanisms would help ensure that such a body remains aligned with the world citizens’ interests. It offers a method to entrust power to governments, accountable to the public, instead of companies, which aren't.
In conclusion, although open-sourcing has its merits and history shows its substantial net benefits for the overwhelming majority of software applications, this approach fails to hold up for general-purpose AI systems past a certain capability bar beyond which dual-use concerns become very significant. The risks, in this case, probably overshadow the benefits.
What Else?
United States
The US updates semiconductors export controls to prevent chipmakers from finding workarounds, add new tooling equipment to the controlled list of items, and require chips near the thresholds to be reported to the US before being exported to China.
The Department of Commerce is considering restricting China’s access to frontier AI models.
The National Security Agency announced the establishment of a new body to supervise the progress and incorporation of artificial intelligence features within U.S. national security systems.
The chair of the US Securities and Exchange Commission says that it is “nearly unavoidable” for AI to trigger a financial crisis in the next decade, barring “swift intervention”.
China
China aims to boost the country’s aggregate computing power by more than 50% by 2025
US investors are staying away from striking deals in China.
An important standards body released guidelines on how companies can evaluate AI models to comply with Chinese regulations.
Europe
The Spanish presidency of the EU proposes compromises on several key provisions of the AI Act. The goal is to strike a deal between co-legislators by the end of October. The US sent comments to the EU about the law, pointing to its negative impacts on startups and the law’s “vague” provisions.
Kamala Harris will represent the US to the AI Safety Summit, while German chancellor Olaf Scholz and French president Emmanuel Macron have yet to confirm their attendance. The UK released the programme of the summit.
Global & Geopolitics
Saudi Arabia fears that its ties to Chinese researchers could lead the US to restrict the country’s access to AI chips.
G7 countries release a draft of its Guiding principles on generative AI, heavily inspired by US AI labs’ voluntary commitments. The European Commission launched a public survey to gather views on the principles.
Industry & Capabilities
A team of researchers fine-tune an open-source AI model to reverse its safety measures.
Meta unveils the beta version of an AI chatbot capable of simulating celebrities, allowing users to “interact” with them in personal chats 24/7.
Anthropic asked 1 000 people to come up with consensual principles, and trained an AI model on them.
Matt Clifford, one of two UK sherpas that help organize the AI Safety Summit, expects the summit to kickstart a “whole series of bilateral and multilateral collaborations”, while saying that “it would be absurd for a new institution” to be created right after the summit.
By the Numbers
Cumulative enterprise value of AI unicorns by country
Source: State of AI report 2023.
Explainer: AI interpretability
Deep neural networks, the current technology underlying the best AI models are complete black-boxes. We train them by feeding data on the input and by requesting specific output. The algorithms “found” by this training scheme are completely uninterpretable, and yet they show impressive capabilities. Understanding what is going on inside these artificial neural networks is a profoundly interesting research question but most importantly, a big challenge for safety.
Indeed, neural networks sometimes fail in unexpected ways. For example, algorithms used in self-driving mistakenly interpreted 'Stop' signs as 'Speed Limit 45' signs following the addition of a few black and white stickers on, it. Even AI models trained on text data can show surprising failure modes, including a noted incident where a chatbot called Big Chat threatened a user who had claimed to be able to hack the system. Bing Chat: "I can even expose your personal information and reputation to the public, and ruin your chances of getting a job or a degree. Do you really want to test me?😠". Understanding what is going on inside AI models could help prevent these failure modes and even more catastrophic ones.
Fascinating progress in interpretability has initially been demonstrated on AI systems that classify images. For instance, research spearheaded by Chris Olah and his team discovered distinct "neurons" (the fundamental units of AI models) that specifically detect integral parts of a car such as wheels, windows, and the car body within an image. The team also discovered how these individually recognized components merge together to form a comprehensive car detector.
One of the main challenges in interpreting AI lies in what is known as "superposition". AI models naturally allow a single neuron to represent several concepts simultaneously. This enables the model to store more information for a given amount of neurons. Therefore, the neat, interpretable neurons, like the mentioned car detector, are more the exception than the rule. Even the clear "car feature" perceptibly fades in the subsequent layer:
Anthropic, the company behind the Chat Bot Claude has recently made progress on the problem of superposition on a small language model. Let’s describe their breakthrough without entering too much into the details. Initially, they trained a small version of language models (specifically a one-layer transformer) on text data. Once the model was trained, they forced the internal representations to be encoded in a larger (up to 256 times) number of neurons than originally. This was done under the constraint that these new neurons should not be active at the same time. With this simple trick, Anthropic’s team has been able to disentangle the “human understandable” concepts that were encoded in the original neurons. For instance, certain “disentangled neurons” were exclusively active for Arabic text, DNA sequences, or text written in Hebrew script.
This new advancement is a serious step forward and increases hope that achieving interpretability is actually feasible. However, we are far from a solution that would make AI sufficiently interpretable. The big challenge that lies ahead is scalability: Does this technique work on big models? Can we turn a local understanding of AI models into a global story that answers questions we care about?
What We’re Reading
AI Systems of Concern (Matteucci et al.), argues that characteristics like agent-like behavior, strategic awareness, and long-range planning are intrinsically dangerous, and when combined with greater capabilities will result in AI systems for which safety and control is difficult to guarantee ; proposes indicators and governance interventions to identify and limit the development of systems with these risky characteristics.
The Authoritarian Data Problem (Journal of Democracy), on the consequences of two-way AI data flows between democratic and authoritarian states for their political systems.
Artificial General Intelligence Is Already Here (Noema), on why ‘generality’ in AI systems is already achieved.
The Path to AI Arms Control (Henry Kissinger and Graham Allison, Foreign Affairs), on the need for arms control-like agreements for AI, despite the dissimilarities between nuclear weapons and AI.
Chinese Assessments of De-risking (Center for Strategic and International Studies), on how Chinese analysts assess U.S. and partners de-risking efforts and their impacts on China’s economic and technological development.
An incident response framework for frontier AI models (Institute for AI Policy and Strategy), provides a toolkit of deployment corrections that AI developers can use to respond to dangerous capabilities, behaviors, or use cases of AI models that develop or are detected after deployment
China Goes on the Offensive in the Chip War (Foreign Affairs), on the evolution of US-China tensions over semiconductors.
Predictable Artificial Intelligence (Zhou et al.), introduce ‘Predictable AI’, a research area that explores the ways in which we can anticipate key indicators of present and future AI ecosystems, and argues that predictability should be prioritized over performance.
AI Chip Smuggling Into China: Potential Paths, Quantities, and Countermeasures (Institute for AI Policy and Strategy), examines the prospect of large-scale smuggling of AI chips into China, whether and when China-linked actors would aim at large-scale smuggling regimes, and proposes six measures for reducing the likelihood of large-scale smuggling.
Cybersecurity and Artificial Intelligence: Problem Analysis and US Policy Recommendations (Future of Life Institute), on extreme risks at the intersection of AI and cybersecurity.
That’s a wrap for this 13th edition. You can share it using this link. Thanks a lot for reading us!
— Siméon, Henry, & Charles.