#14 - Day 1 of Global AI Safety Governance
Last week, 28 countries met in Bletchley Park for the world’s first AI Safety Summit.
“There is potential for serious, even catastrophic, harm, either deliberate or unintentional, stemming from the most significant capabilities of these AI models.” - Bletchley Declaration, 2023
Policymakers have come a long way since the release of ChatGPT less than a year ago. Concerns among the public related to the extreme safety and misuse risks of AI have materialized in this first international summit.
British Prime Minister Rishi Sunak succeeded in bringing about what many call a diplomatic coup by bringing together US and China representatives in a room together, at a time where strategic competition on AI and emerging technologies makes these interactions very scarce. One among several summit outcomes was a multi-continent declaration on AI risks.
U.S. Commerce Secretary Gina Raimondo listening to Wu Zhaohui, China's vice minister of science at the summit at Bletchley Park (source: Politico)
How was this emerging consensus built? Through private roundtables and demos of risky AI systems. With around 100 participants the first day and 20 the second day (see attendees here and the programme here), an important aspect of the summit was bringing political leaders, AI lab executives, safety and governance researchers, and (some) civil society around the same table. These stakeholders participated in small closed-door sessions where they discussed topics like “Risks from Loss of Control over Frontier AI” or “What should Frontier AI developers do to scale responsibly?”.
Insider look into the conversations: Consistently with the feedback about closed conversations we've heard, a twitter thread by Kanjun Qiu, CEO of Imbue AI and summit participant, says that "people agree way more than expected" and "views were very nuanced". According to her, there were disagreements on:
“the point at which a model shouldn't be open sourced
when to restrict model scaling
what kind of evaluation must be done before release
how much responsibility for misinformation falls on model developers vs (social) media platforms”
Some brought up the zero-sum view that coordination is not realistic, and that we must continue scaling models and capabilities "because opponents [read:China] won't stop". A dominant debate was between proponents and opponents of open-source (see NAIR #13), with those against arguing it could enable access to models by malicious actors who could misuse them, while others argue that open-source is inherently good for its democratization of access, thus addressing the concentration of power in AI companies, and that it aids research safety. Nonetheless, “both sides might agree to solutions that allow models to be freely studied and built upon without giving bad actors access".
Nonetheless, there was consensus on several items:
the importance of model evaluations, with the footnote that this approach can’t solve every safety problem.
“there was agreement around holding model developers liable for at least some outcomes of how the models are ultimately used.”
“general agreement that current models do not face risk of loss of control.”
“There was almost no discussion around agents—all gen AI & model scaling concerns.” For more, you can read the summary of the discussion here and here.
What to make of all this? The summit was not exempt from criticisms. In particular, many lamented the lack of geographical diversity, involvement of civil society, and discussion of issues other than frontier AI. The summit also exacerbated perceptions of industry capture, an issue bound to overshadow many regulatory efforts on frontier AI, since this focus conveniently excludes from the scope of potential regulations models that exist and generate profits for their developers today.
Furthermore, let’s get it out of the way: No legally binding agreement on AI safety guardrails came out of the summit. The signatories of an open letter calling for an international treaty for safe AI development, including luminaries Yoshua Bengio, Gary Marcus, or Yi Zeng, didn’t see their wish come true1.
Still, in many ways, this is a great start. As said by Connor Leahy, CEO of the AI safety company Conjecture, “this is not the place where policy gets made in practice, this is the kind of place where the groundwork gets laid”. The summit started a much-needed conversation.
But that’s not all.
The Bletchley Declaration: An emerging international consensus on AI risks
At the end of the summit's first day, 27 countries and the EU signed the "Bletchley Declaration", a statement that sees AI as posing a “potentially catastrophic risk to humanity” as well as short-term risks, emphasizes the duty of AI labs in developing AI safely, and commits countries to working together on AI risk identification and mitigation. China, the US, India, and others were among the signatories.
Here are the most significant (edited) extracts of the declaration:
Particular safety risks arise at the ‘frontier’ of AI. Substantial risks may arise from potential intentional misuse or unintended issues of control relating to alignment with human intent. There is potential for serious, even catastrophic, harm, either deliberate or unintentional, stemming from the most significant capabilities of these AI models. Deepening our understanding of these potential risks and of actions to address them is especially urgent.
Many risks arising from AI are inherently international in nature, and so are best addressed through international cooperation. We resolve to intensify and sustain our cooperation, and broaden it with further countries, to identify, understand and as appropriate act.
Actors developing frontier AI capabilities have a particularly strong responsibility for ensuring the safety of these AI systems, including through systems for safety testing, through evaluations, and by other appropriate measures. We encourage all relevant actors to provide context-appropriate transparency and accountability on their plans to measure, monitor and mitigate potentially harmful capabilities and the associated effects that may emerge, in particular to prevent misuse and issues of control, and the amplification of other risks.
Our agenda for addressing frontier AI risk will focus on:
identifying AI safety risks of shared concern, building a shared scientific and evidence-based understanding of these risks
building respective risk-based policies across our countries to ensure safety in light of such risks, collaborating as appropriate while recognising our approaches may differ
Of course, this can be criticized. The statement isn't very detailed; it acknowledges the potential of AI to pose risks more than it lays out specific risks. But it's a start, and a commitment between world powers to cooperate at the global level on an issue that was seen as fringe only a year ago can be welcomed.
“State of the Science”: New global panel on AI Safety
As part of the summit's emphasis on fostering an international consensus on the risks of AI, the UK launched an international panel of experts in AI safety, who will support the publication of a "State of AI Science" report.
Countries at the summit have pledged to appoint experts to a global panel that will issue a report on the “State of Science” in AI. In the Bletchley Declaration, they declared their "resolve to support an internationally inclusive network of scientific research on frontier AI safety”.
This network, the Expert Advisory Panel, will assist in drafting a report that reflects the scientific consensus on the dangers and potential of advanced AI technologies. This, in order to move toward an internationally shared consensus about frontier AI safety. It's set to be released before the next Summit in South Korea next year.
Yoshua Bengio, laureate of the Turing Award and member of the UN’s high-level AI advisory panel, will preside over this global report as chair. He also played a role in crafting one of the first government documents on AI safety and extreme risks, released by the UK just before the summit.
The drafting team of the State of Science report will consist of top AI scholars from various backgrounds, bolstered by the Expert Advisory Panel comprising delegates from the summit countries. The chair's secretariat will be hosted by the UK through its Safety Institute, which is also expected to contribute its research to the Report.
In line with the model for the expert panel, the International Panel on Climate Change (IPCC), the goal here is not to present new findings but to distill the best existing research. Mimicking the IPCC's functioning, the report will aggregate existing AI research in an authoritative manner.
Also like IPCC reports, the State of Science report won't make policy recommendations to governments. It will, however, strive to identify risks from AI, so as to be "policy-relevant", so that both international and national policy discussions rest on solid scientific ground. By refraining from conducting primary research or proposing policies, this institution would reduce the potential conflicts that might arise from a more hands-on role.
Why is this important? Scientific consensus contributes to political consensus, which enables the creation of international agreements. Currently, there is no scientific consensus on the risks posed by advanced AI models. But transformative AI development poses unique challenges that will require preventive international measures. The establishment of such measures would be deeply facilitated by a unified scientific understanding of the trajectories of AI development and the prediction of associated risks. Worldwide agreement on the risks of frontier AI could help foster consensus on how appropriate and effective are proposed policies to tackle those risks.
How? As details on the report are light, here are some ideas of the functions the new panel could take up, based on what the IPCC does:
summarizing the state of the art: what are AI systems capable of?
identifying threats: what are the risks posed by artificial intelligence?
predicting future evolutions: what will the next AI systems be capable of?
assessing solutions: how effective are public policies in tackling AI risks?
Can that work? Right now, there's not enough scientific study on the dangers of advanced AI. The proposed panel will help tackle this problem, but it won’t conduct primary research. To fill this gap, a group of researchers (Ho et. al 2023) have some great ideas:
A IPCC-like panel for AI could "undertake activities that draw and facilitate greater scientific attention, such as organizing conferences and workshops and publishing research agendas. It may be helpful to write a foundational “Conceptual Framework” to create a common language and framework that allows the integration of disparate strands of existing work and paves the way for future efforts"
Everything is politics: In the future, the panel will have to evolve into a more representative body. The announced involvement of unknown “partner countries” to the drafting of the report won’t suffice in making this panel truly legitimate. Representation is needed in terms of discipline, too. Researchers from the social sciences should be involved along with AI scientists.
It's also unclear who will make critical decisions on this and future editions of the State of Science report, such as choosing the scope of the report, the number and focus of its working groups, the level of political involvement in editing the report (significant but constrained in the case of the IPCC), or the regularity of publication. Due to its leadership position during the summit, right now, it seems like the UK is calling the shots. Will this rotate to South Korea when it hosts the next summit? Clearer decision-making procedures on these questions would help make the panel more legitimate.
Safety Institutes: Collaboration on Safety Testing
The world’s leading AI developers agreed to provide access to their frontier models before they are released so that a handful of governments can test them “against a range of critical national security, safety and societal risks”.
The specifics of the testing process are unclear and the government-lab agreements on model access are made on a voluntary basis. Still, this is a step towards more accountability and visibility into the oft-hidden nature of frontier AI development. Governments parties to this agreement, which doesn’t include China, will develop the technical capacity to evaluate models’ capabilities and risks, and assemble teams to advance AI safety research. How?
New type of government institution just dropped: The UK and US each announced the launch of an AI Safety Institute (AISI), through which they will test models before they’re deployed. In the UK, the Institute will take up the work of the Frontier AI Taskforce, which was launched a few months before the summit to contribute to its organization and build up government capacity in evaluating AI. These two institutes and other designated entities in each of the agreement’s signatories will partner up, notably to share the results of the model evaluations between each other. The goal: be able to carry out evaluations on the next generation frontier models before they are released next year.
What for? It’s unclear to what extent the two existing AISI will serve the same purpose, especially as there are less details about Washington’s Institute than on London’s. To further its mission of “minimizing surprise to the UK and humanity from rapid and unexpected advances in AI,” the British government lays out 3 main functions for its Institute:
Develop and conduct evaluation on frontier AI systems
Evaluators will seek to determine how safe and secure these systems are, as well as their potential societal consequences. To this end, the Institute will work both independently and with other entities, like existing AI evaluations organizations ARC Evals or Apollo Research (who were already partners of the Frontier AI Taskforce). Evaluations will concern misuse risks, societal harms, system security and safety, and loss of control risks.
As we highlighted in our last edition on the (then-rumored) UK Safety Institute, the usefulness of government-led evaluations rests on the level of access provided by companies. There is no real benefit to this agreement if they just get the same level of access as the business customers of AI labs (ie, being able to prompt the model through an API). The level of access could be made meaningful (and truly privileged compared with the rest of the world) if governments gain access to these models before technical safety guardrails are put in place (ie, after pre-training): they would then see the raw capabilities of these models, and thus learn what might happen if malicious actors are able to remove these guardrails on open source models. But that might be difficult to achieve: following the Summit, AI firms are reportedly already worried about the growing number of organizations worldwide that want to gain access to their technology.
Finally, the launch of the Safety Institutes also marks the start of a government-led standardization process for corporate safety policies. In the US, the National Institute of Standards and Technology is tasked with building upon its AI Risk Management Framework to create standards for frontier AI development and evaluations. As highlighted by the British government, “while developers of AI systems may undertake their own safety research, there is no common standard in quality or consistency,” a problem that may be remedied through concerted actions between Safety Institutes.
Drive foundational AI safety research
Institute employees will kick off various investigative projects and bring together experts from across the field to push boundaries in the “science of AI evaluations” and other areas of AI safety research. That’s important, because evaluations are crucial in enabling accountability for AI development, as highlighted in the Bletchley Declaration and during the summit’s roundtables. According to Seán Ó hÉigeartaigh, director of the programme ‘AI: Futures and Responsibility’ at the University of Cambridge, “there is much work to be done to develop this nascent area of work [...] It is exceptionally challenging to establish safety guarantees for AI systems capable of taking a broad range of actions in open-ended environments.”
The success of Safety Institutes in this regard will hinge on whether governments can attract and retain world-class talent, as rightfully pointed out by Tom Westgarth, senior analyst at the Tony Blair Institute for Global Change. Along with talent, compute capacity may be another bottleneck to the new body’s efforts; but the Institute has already secured priority access to the newly-launched UK AI Research Resource.
Facilitate information exchange
This is where things get interesting. According to the announcement:
“To ensure that relevant parties receive the information they need to effectively respond to rapid progress in AI, the Institute will appropriately share its findings with policymakers, regulators, private companies, international partners, and the public. This includes sharing the outcomes of the Institute’s evaluations and research with other countries where advanced AI models will be deployed, where sharing can be done safely, securely and appropriately - as agreed at the AI Safety Summit.”
It’s obvious the evaluations will help regulators develop technically-grounded policy. The UK AISI will “feed up to date information from the frontier of AI development and AI safety into government”. Indeed, bolstering public capacity for model evaluations addresses a concern often brought up in tech policy debates: that governments don't understand a technology enough to regulate it.
Even though the UK took care in noting that its Safety Institute is no regulator, such institutes, due to their expertise and access to models, would be best positioned by far to act as government model evaluators that could help determine future licensing decisions (on whether a model can be deployed, or even whether an AI lab can develop a given model according to a pre-training risk assessment).
The Institute will also act as a “trusted intermediary, enabling responsible dissemination of information as appropriate. It will support the establishment of a clear process for academia and the broader public to report harms and vulnerabilities of deployed AI systems.” We understand here that information about AI safety research, results of model evaluations, and AI accidents and harms will flow both from and to the Institute. The possibility for various stakeholders to report developments to the Institute might also enable rapid reaction to these evolutions, perhaps triggering government action in response.
But we now come to the central issue of this “information-sharing” function: The announcement says the UK Institute will “share the results [of safety evaluations] as appropriate”. What’s “appropriate”? The Institute will share these with countries “where the frontier AI model will be deployed”, where sharing can be done “safely and securely”. Does this mean that it will share these results with countries that aren’t party to the agreement? The decision to share the results of model evaluations will be a highly political one. It’s unlikely the UK or US will share such critical information as detailed model evaluation results to countries like China.
But this raises deeper questions about the legitimacy of keeping known risks private: don’t all countries deserve to know if, for example, after a model evaluation, the Institute learns that using a given architecture in a model can give rise to catastrophic risks? How would the information flow work then? A big problem with sharing evaluation results, as this example shows, is that they can easily lead to learning more about the capabilities of models, or the way in which they were developed.
Distinct but equally important considerations apply to information flows with other actors, including the wider public. If this push for more transparency from companies ends up increasing only public officials’ understanding, the mechanism may not reach its full potential in terms of fostering accountable AI development.
Finally, it will be interesting to see whether countries that haven’t yet established institutions akin to the Safety Institutes will rely on their own, or the US’ and the UK’s evaluations. There's already uncertainty about how even these two institutes will work together, especially since the U.S., which hosts nearly all top AI companies, is very protective of information with national security implications.
An AI Safety Summit every 6 months: an opportunity for global accountability?
The next AI Safety Summit will be held online, hosted by South Korea in the first half of 2024, and a third summit will be held in France in the latter half of the year.
What should we hope for at the next summits? Cambridge’s Seán Ó hÉigeartaigh wants “to see the next Summit in South Korea forge a consensus around international standards and monitoring, including establishing a shared understanding of which standards must be international, and which can be left to national discretion”.
Another lesson in incentivizing politicians and public servants to deliver continued progress on the goals set in the Bletchley Declaration lies in the functioning of the US-EU Trade and Technology Council (see NAIR#6). Through this body, EU and US political leaders meet every 6 months, while 10 working groups hammer out cooperative projects, align their practices, share information, and develop a united stance on strategic issues. This formula could be fruitfully used in the context of future Safety Summits.
Still, that states agree that international cooperation is needed doesn’t mean it will take place. Countries are very diverse, with both developed and (some) developing countries, autocracies and democracies, allies and arch-rivals. Cooperation between varied states with such stark differences in interests is not easy. Furthermore, the desire for cooperation won’t root out the imperative for state to compete for leadership in AI:
source: The Economist
A path to accountability? If we can’t expect any easy wins on new binding international agreements, what can we expect? Making Safety Summits a regular occasion points to a path for making AI labs more accountable. Every 6 months, companies could show how far they’ve progressed since the last meeting in implementing their voluntary commitments. The list of those commitments is growing: in addition to those released in July by the White House (see NAIR#10), the 7 companies invited to the summit committed to three additional safety policies, and many are expected to sign up to the code of conduct just released by G7 countries.
The UK requested the 7 companies invited to the Summit to publish their AI safety and risk management policies across 9 areas, and published an accompanying document describing emerging corporate AI safety policies. Leading AI labs already told the White House they would implement safety policies from #2 (model reporting) to #7 (ai safety research). The UK adds three efforts:
Responsible capability scaling, a policy developed by AI auditing organization ARC Evals. Anthropic is the only AI lab as of yet that has implemented such a policy (see our Governance Matters edition on responsible scaling).
Misuse prevention and monitoring: this area is of particular importance to governments, who want to prevent non-state actors such as terrorist groups or organized crime from using AI systems for malicious purposes (e.g. developing bioweapons)
Data input control and audits: To take the example of general-purpose AI systems used to develop bioweapons, if you are able to identify and remove data related to such weapons, you could prevent users from using it for this purpose.
A group of researchers compared the document released by the UK government on corporate safety policies with what these companies published. They found a lot of room for improvement:
Keep in mind that these are companies stating that they are looking to implement these policies. There is no reliable way to be assured that this is indeed the case. Similarly, the recently released G7 code of conduct for AI labs calls on these organizations to create "self-assessment mechanisms" to grasp how much of the code of conduct they indeed implemented. In other words, AI labs will grade their own homework. Instead, at every Safety Summit, companies could present their implementation of their voluntary commitments linked to the G7, the White House, and the Safety Summit.
The launch of regular Safety Summits is also the occasion to reduce fragmentation of governance initiatives around AI, by deepening the connections between initiatives like the G7's AI Hiroshima process, the GPAI, the UN. Making the Safety Summit a regular occasion of the diplomatic agenda could be a way to assess progress on these various efforts. Ensuring coherence between domestic regulations and international initiatives and statements will also be crucial, as rightfully pointed out by researcher Charlotte Stix, who wants to see alignment between the EU's AI Act and the G7 code of conduct for AI labs.
It is key for companies to be transparent about what they’re doing. It will allow NGOs, citizens, and governments to hold them accountable for it. Doing so at an event held every 6 months, with a lot of political and media attention, may incentivize these publicity-starved companies to show their goodwill. Awaiting national and international regulation, making companies stake their reputations on ‘compliance’ with these commitments is a good way to make them abide by them.
That’s a wrap for this 14th edition. You can share it using this link. Thanks a lot for reading us!
— Siméon, Henry, & Charles.
Disclaimer: Siméon, one of the writers of NAIR, was involved in setting up this open letter.