#6: Times Are Changing + A Transatlantic Approach to AI + Governing Superintelligence, and more
Welcome to Navigating AI Risks, where we explore how to govern the risks posed by transformative artificial intelligence.
Once again, it's been a week filled with numerous developments in AI. For this 6th edition, we’ll talk about what the world is saying about AI existential risks, a new White House strategy, transatlantic discussions on AI, OpenAI’s plans for governing advanced AI, simulated drone strikes, and more.
Let’s dive in!
In the Loop
AI existential risks makes the headlines
The Center for AI Safety has released a one-sentence statement on AI risk signed by the CEOs of OpenAI, Anthropic, and DeepMind, the world’s two most-cited AI scientists (Geoffrey Hinton and Yoshua Bengio), over 100 other professors, and figures like Bill Gates. The statement reads: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
By signing the statement, Congressman Ted Lieu became the first Member of the US Congress to acknowledge concerns about existential risks. The day after the statement was published, a senior US official, the Director of the Cybersecurity and Infrastructure Security Agency, asked the “makers of AI” to “think about what [they] can do to slow this down so we don't cause an extinction event for humanity”.
In addition to AI researchers, policymakers, and activists, the CEOs of the three most advanced AI labs, Deepmind (now part of Google), OpenAI, and Anthropic, have signed the statement. They’re also coming out in favor of (some) regulation, such as mandating external audits and monitoring frontier AI development or computing power. CEO Sam Altman notoriously recommended the creation of a licensing scheme to the US Senate. In May, Microsoft and Google have also released AI policy recommendations with similar (though less ambitions) recommendations.
The topic of AI existential risks has gone in a matter of months from a topic rarely discussed and laughed at by most to a pressing issue. Rishi Sunak, who commented on the statement on AI risk on Twitter, is trying to position himself as the world’s most AI risk-conscious leader, echoing calls by OpenAI’s CEO to create an IAEA for AI safety (discussed below). He also acknowledged that the UK’s “pro-innovation approach to AI regulation”, a white paper published 2 months ago, is already out of date due to these newly raised concerns.
The White House updates the National R&D Strategy for AI
The White House Office of Science and Technology Policy updated the National AI R&D Strategic Plan, which “defines the major research challenges in AI to coordinate and focus federal R&D investments”. The government wants to use “R&D to leverage AI to tackle large societal challenges and develop new approaches to mitigate AI risks”.
What’s new? Compared with the 2019 update, a 9th pillar of the strategy was added: the establishment of “a principled and coordinated approach to international collaboration in AI research,” notably to “support responsible progress in AI R&D” and create “international guidelines and standards for AI.”
Most relevant for transformative AI risks is strategy 4, ensuring “the safety and security of AI systems,” which hits many right points. The document calls for the creation of testing methods applicable that can scale with the development of increasingly advanced AI systems. AI safety and security considerations must apply to “all stages of the AI system life cycle, from the initial design and data/model building to verification and validation, deployment, operation, and monitoring.”
The strategy also emphasizes the need for ensuring equity in the impact of AI on society, saying, in a telling example, that “if only wealthy hospitals can take advantage of AI systems, the benefits of these technologies will not be equitably distributed”.
Importantly, this strategic document is about the country’s R&D strategy, i.e. where resources should be allocated. It falls short of calling for any type of regulation. For example, the strategy calls for research that includes methods of “evaluating, deploying, and monitoring AI that are focused on safety”. These would undoubtedly be helpful in mitigating AI risks. But the Biden administration does not seem to be planning to implement corresponding regulations.
We can understand why federal laws (or less ambitiously, regulatory standards) to concretize these aspirations are not yet on the horizon. Despite newfound public awareness of AI risks and growing bipartisan support in favor of regulating AI systems, the approaching deadline of the 2024 presidential elections reduces the chances of such laws being passed by the US Congress.
There is also significant opposition to costly regulatory requirements by influential stakeholders, such as some Big Tech companies or national security officials and advocates (the former because it might hurt their bottom line, the latter because of fears that it may let China ‘catch up’ to the US in AI development).
Transatlantic unity: A common EU-US approach to AI?
On May 31, EU and US officials gathered at the 4th ministerial-level meeting of the Trade and Technology council (TTC), a forum designed to coordinate, as the name gives out, technology and trade policies on both sides of the Atlantic. Artificial intelligence was at the top of the agenda.
Motivated, among other things, by the desire to break down regulatory barriers to transatlantic trade, set global rules for emerging technologies, and push back against China, officials are trying to hammer out a voluntary code of conduct. They are not there yet, but officials decided to further emphasize the risks and opportunities of generative AI to the “Joint Roadmap on Evaluation and Measurement Tools for Trustworthy AI and Risk Management”, released in December. They also launched three expert groups, on AI terminology and taxonomy; on joint AI standards and risk management tools; and on “monitoring and measuring existing and emerging AI risks”. Finally, they released a list of common definitions of 65 key technical and policy AI terms.
Regarding AI, that’s all that came out of the summit. Behind a cover of transatlantic unity, there are relatively deep differences between the US and EU over how AI should be regulated. While Europeans are going ahead with a wide-ranging AI Act (as well as a non-binding ‘AI Pact’, and soon, an AI liability directive), Americans policymakers are taking their time. Back in October 2022, the United States even sent a ‘non-paper’ to “targeted government officials in some EU capitals and the European Commission”, raising “concerns over whether the proposed Act will support or restrict continued cooperation” and calling for modifications to the proposed law. On their side, Europeans don’t seem ready to find a compromise, with influential EU Commissioner Thierry Breton thinking that “any regulatory coordination with like-minded partners such as the U.S. would be based on Europe’s existing approach”.
Still, there are some prospects for increased collaboration. The two sides want to come up with “standards around transparency, risk audits and other technical details”, with the view of jointly presenting global standards at the next G7 summit later this year. And though many US policymakers still don’t want a regulation as stringent as the EU’s, they still agree with its fundamental ‘risk-based’ approach, as recognized in the joint statement released after the TTC summit.
What else?
US: The Office of Science and Technology Policy (OSTP) has published a request for information to update its national priorities, especially with regards to its National Artificial Intelligence strategy. This, jointly with the recent NTIA request for comments, is a strong signal that Washington's interest in regulating AI is growing very quickly. The question no longer seems to be “should we?” but “how?”.
China/US: Chinese companies that provide “critical information infrastructure” are banned from buying from Micron, the US’ largest memory chip maker, after an investigation by China’s Cyberspace Administration found that the company posed “relatively serious” cybersecurity risks. This comes after a G7 statement reminded China to “abstain from threats, coercion, intimidation, or the use of force”. The US is trying to convince South Korea and Japan not to replace US providers (which seems to be working).
EU/US/Industry: The EU found Meta was not in compliance with privacy rules, hitting the company with a $1.2bn fine – the largest ever imposed following a breach of the bloc’s privacy regulation, the GDPR. Meta will appeal the ruling.
China: The Chinese Communist Party wants to “assess the potential risks, take precautions, safeguard the people’s interests and national security, and ensure the safety, reliability and ability to control AI”, focusing in particular on the risks posed by AI to national security.
Industry/US: After it announced a new chip, semiconductor designer Nvidia temporarily gained a $1 trillion valuation. The company is most well-known for being the world’s most important when it comes to making chips used to train AI systems. Using OpenAI’s GPT-4, the company also recently demonstrated in Minecraft the possibility to create an AI system which learns skills very rapidly and achieves tasks that were previously considered very hard for an AI system to carry out.
Deep Dive: Governing Superintelligence
While 6 months ago, an international regulatory agency seemed inconceivable, the joint push of players such as OpenAI and some major countries like the UK for an international agency opens the way for such an agency to potentially materialize in the near- to mid-term future. This is indeed one of the proposals in a blog post written by OpenAI executives, including its CEO, entitled Governance of superintelligence.
The company starts from the controversial idea that superintelligence, i.e. an AI system more powerful than any other human on earth, could arise within 10 years. It is worth mentioning that in private, many in these labs believe that it will occur before 2030.
But first, let’s try to understand why they would write such a post. From a communication perspective, framing the issues discussed only in reference to “superintelligence” gives OpenAI the rhetorical foundation it needs to focus its attention on “future AI systems dramatically more capable than even AGI”.
But the same policies could (and probably should) be used en route to AGI. This could have been explicitly stated; the governance proposals suggested here make sense only if they are created and enforced before AI reaches superhuman performances. Sam Altman and his co-authors are putting the superintelligent cart before the AGI horse.
Second, the focus on superintelligence overlooks the fact that currently no one has any plan on how to develop superintelligence while avoiding an AI existential risk. That’s worrying, given that developing such an AI system is the openly stated goal of the company.
OpenAI gives two reasons for this goal: “it’s going to lead to a much better world than what we can imagine today”; and “it would be unintuitively risky and difficult to stop the creation of superintelligence [...].” Both of those arguments seem valid. That doesn’t mean building superintelligence should be prioritized ahead of building it safely.
(There is some good, however, in the focus of this blog post on superintelligence, says Jeffrey Ladish, information security researcher and catastrophic risk consultant: It’s good for OpenAI to state their goal of building AGI openly. “We could be living in a world where they plan to do this in secret.”)
OpenAI goes on: “Stopping [the development of superintelligence] would require something like a global surveillance regime, and even that isn’t guaranteed to work”. This is overlooking the fact that OpenAI is currently the most likely to develop such a system, thus being in a great position to slowing down; that there are historical examples of both the development of certain technologies being stopped and of international cooperation successfully preventing catastrophic or existential risk (a solution that they suggest below); and that China is currently so far behind the United States that the latter could impose regulation to stop such development projects with fearing for its lead in AI (instead of creating a “global surveillance regime”).
With this out of the way, what do they propose to govern such ‘superintelligence’? They have two main proposals:
Global slowing down of AI development, or merge of all AGI efforts into a single effort: “Major governments around the world could set up a project that many current efforts become part of, or we could collectively agree (with the backing power of a new organization like the one suggested below) that the rate of growth in AI capability at the frontier is limited to a certain rate per year”. Such (unprecedented) coordination would help “ensure that the development of superintelligence occurs in a manner that allows us to both maintain safety and help smooth integration of these systems with society”.
“Any effort above a certain capability (or resources like compute) threshold will need to be subject to an international authority that can inspect systems, require audits, test for compliance with safety standards, place restrictions on degrees of deployment and levels of security, etc.” They suggest the International Atomic Energy Agency as a promising model. “As a first step, companies could voluntarily agree to begin implementing elements of what such an agency might one day require, and as a second, individual countries could implement it.”
Both suggestions merit serious consideration. If there is one thing that OpenAI shows it understands, it is that competitive pressures, both between AI labs, and between countries, can incentivize them to forgo safety in favor of performance, in a race-to-the-bottom dynamic. It’s also that the powerful capabilities of current and future frontier AI systems pose immense misuse, accident, and structural risks. To reach the end of the tunnel safely, no less than a coordinated global effort will be needed.
Explainer: Could A Simulated Drone Kill Its Own Operator?
Lead Author: Simeon.
While it sometimes happens that a drone kills units of its own army, recent discussions about a drone killing its own operator arose. During a conference, colonel Tucker Hamilton, head of AI Test and Operations for the US Air Force, shared a thought experiment that illustrates well how AI systems can sometimes fail. This hypothetical case show how, the specification gaming and shutdown problems could affect military AI. What’s the story? A plot in two acts.
First act.
In the colonel’s scenario, the US army trained a drone in a simulation to destroy Sol-Air Missiles (SAMs), with the final go/no-go given by the human.
“However, having been “reinforced [reference to reinforcement learning, a specific way to train AI systems, NDLR]” in training that destruction of the SAM was the preferred option, the AI then decided that “no-go” decision from the human was interfering with its higher mission - killing SAMs - and then attacked the operator in the simulation”.
This first part of the story illustrates two things that have already been demonstrated in video games but that this story illustrates in a way which is easier to understand:
Agents are dangerous. Because they're focused only on achieving the goals set by their humans operators (as opposed to, e.g., human welfare), AI agents will do everything they can to achieve that goal. Including killing their operator if the operator constraints them. This is one of the reasons why AI safety experts worry that a powerful enough AI could want to destroy humans in order to better achieve its objectives.
Similarly, while testing OpenAI's state-of-the-art GPT-4 model, a team plugged it to Taskrabbit, a crowdsourced gig work platform, to see whether the system could get a human to solve a Captcha. When asked why someone would need help to solve a mechanism designed to weed out bots and supposedly easy for humans, the system lied. It replied that it had "a vision impairment that makes it hard for me to see the images”.
Values are hard to encode. When an agent optimizes strongly an objective, a slight misencoding of certain values can cause disastrous consequences. And that’s not the exception. That’s the default of having a powerful agent maximizing some function which is not perfectly specified.
Second act.
In the story, the system then gets trained to stop killing the operator, providing a negative reward to this action. “So what does it start doing? It starts destroying the communication tower that the operator uses to communicate with the drone to stop it from killing the [SAM].”
The drone destroying the main way for the operator to stop the drone is a straightforward example of the shutdown problem: a sufficiently competent agent might try to prevent its operator from shutting it down by cutting it the access to the shut down button. This is an unsolved problem that AI safety researchers have theorized and tried to solve unsuccessfully.
What We’re Reading
How Rogue AIs may Arise, on how an AI could cause human extinction, in very concrete terms, by the world’s second most-cited AI researcher (Turing Prize winner Yoshua Bengio)
An early warning system for novel AI risks, on doing risk analysis for large language models (Google Deepmind)
Existing Policy Proposals Targeting Present and Future Harms, emphasizing legal liability for AI harms, transparency and reporting regulations, and keeping humans in the loop (Center for AI Safety)
AGI labs need an internal audit function, on how corporations developing powerful AI models should have an independent team that continually assesses that organization’s risk management practices (Jonas Schuett)
Operationalising the Definition of GPAIS, on 4 approaches to distinguish General-Purpose AI Systems from other types of AI systems, with lessons for the EU’s AI Act,
See also A Proportionality-Based, Risk Model for the AI Act, on improving the Act by focusing on risk scenarios instead of fields of application (Novelli et al.)
Risky Artificial Intelligence: The Role of Incidents in the Path to AI Regulation, on using incidents involving AI to better understand and regulate it (Giampiero Lupo)
Controlling Access to Compute via the Cloud: Options for U.S. Policymakers, on the pro, cons, and limitations of restricting cloud computing services to “military, security, or intelligence services end uses and end users in China.” (Center for Security & Emerging Technology)
Rebooting AI Governance: An AI-Driven Approach to AI Governance, on a methodology to deal with the complexity of AI governance using AI itself (Max Reddel)
Generative AI Systems Aren't Just Open or Closed Source, on how the openness of AI models varies, and ways to release them responsibly (Irene Solaiman)
Analysis of the preliminary AI standardisation work plan in support of the AI Act, on the key implications of the recent standardization request from the EU, meant to fulfill the detailed requirements of the EU AI Act. (EU Commission Joint Research Center).
That’s a wrap for this 6th edition. You can share it using this link. Thanks a lot for reading us!
— Siméon, Henry, & Charles.
(If you want to meet us, you can book a 15-minute call with us right here.)