#4 - The World's First AI Law + A Manhattan Project for AI Safety?
Welcome to Navigating AI Risks, where we explore how to govern the risks posed by transformative artificial intelligence. This week, we’ll talk about European rules for AI, large-scale scientific projects, and the industry’s latest models.
Let’s dive in!
In the Loop
The EU's AI Act is coming
Since the European Commission released a white paper on AI in 2020, EU policymakers have been at work to decide which aspects of AI systems to regulate and how. This process is quickly approaching its end, with the European Parliament set to formally adopt its position in mid-June, though it also has to agree to a final text with the Council, the EU’s other legislative body.
The law has wide-ranging provisions, from banning social scoring and predictive policing to creating regulatory sandboxes for AI companies to test both products and regulation, or requiring transparency on datasets used for generative AI systems. Providers of high-risk systems will be required to put in place risk management and transparency measures. The text mandates the involvement of 3rd-party experts, testing, and documentation to maintain “appropriate levels of performance, interpretability, corrigibility, safety and cybersecurity throughout their lifecycle”. The Act would also require EU member states to designate “supervisory agencies” to ensure compliance and hand out penalties in case of misconduct.
The AI Act matters because it is the world’s first comprehensive attempt at regulating artificial intelligence. And the law won’t have an impact only on European AI companies; companies that want to sell their products or services in the EU will also have to comply with its requirements. As a result, it will be watched closely by other countries, especially the United States – both with worry and curiosity. After the AI Act enters into force, Europe will be able to see whether AI can be regulated for safety and ethical considerations without excessive negative impacts on innovation - or not. US policymakers will be keen to draw lessons for their own regulatory schemes.
If it is successful, the AI Act will set ground rules that will help govern future, more advanced AI systems. The creation of the AI Office, an EU-wide body (and the resulting build-up in policy and technical know-how), continuous monitoring of general-purpose AI systems and foundation models, and technical standards, all part of the current version of the Act, will be helpful in dealing with future, more transformative AI systems.
The current version of the law is not the final one. The Council of the EU, representing member states, may have less stringent ambitions; and lobbyists are at work to tone down requirements. The goal is to finalize the rules by December.
The Race to the Bottom of Top AI Labs Continues
While OpenAI and Anthropic had refrained from publishing the details of their state of the art models because those could be misused to enhance hacking capabilities or assist the building of bioweapons, e.g. for foreign powers like China, North Korea or terrorist groups, Google AI has released PaLM 2 with a detailed technical report explaining new optimal ways to train state-of-the-art models.
Anthropic, which had refrained till now from releasing new state of the art capabilities, has just released a new model with 100k tokens context length, i.e. capable of analyzing in one single shot a 300-pages book. The safety implications of this are unclear for now, but some worry that models with such capabilities would be much more prone to causing large-scale accidents. One of the core concerns is that having short context lengths make large language models less able to coherently act over large sequences of actions. AutoGPT, which we reviewed in a previous edition, an AI autonomous agent based on GPT-4 capable of performing tasks independently, is largely incapable of coherently pursuing the goals assigned to it. Larger context lengths could change that.
According to the technical report for PaLM 2, the model’s risks were evaluated. While recognizing that “risks can arise from misuse, system failures, or when the proper use of a system results in harm or amplifies existing inequalities,” Google focused its evaluation on “representational harms like toxic language, social stereotypes, unfair discrimination and exclusionary norms.” This is necessary, but one may wonder why the evaluation was performed only in this area of potential harm. Furthermore, the model seems to have been evaluated by Google teams, not by external third-party auditors or dedicated red teams. Anthropic did not publish any evaluation of its model’s new capability other than a blog post.
What else?
Industry/US: Google is training Gemini, a new state-of-the-art multi-modal foundation model, built to include new capabilities such as “memory and planning”.
US/China: After the US imposed export controls on advanced semiconductors to China, Chinese companies are studying techniques that could allow them to achieve state-of-the-art AI performance with fewer or less powerful semiconductors.
China: The Communist Party’s Politburo says China should pay attention to the “development of general artificial intelligence, fostering an innovative ecosystem, and prioritizing risk prevention.”
Industry/US: Microsoft’s Chief Economist and Corporate Vice President says that while “AI will be used by bad actors, and [...] will cause real damage,” public authorities should wait until we “see real harm” and avoid regulating AI training runs.
US: The Department of Defense releases its 2023 National Defense Science & Technology Strategy, which includes around $630 million for “basic science and technology research funding” in AI. The Department says it will work to bolster the US’ “comparative advantages rather than engaging in wasteful technology races.”
Industry: Anthropic released a blog post further explaining its approach to training AI systems to follow human values, using a technique it calls “constitutional AI”.
US: A lawsuit was filed against OpenAI for perpetrating “a massive fraud on donors, beneficiaries, and the public at large” and exposing “‘all of humanity’ to massive unprecedented risks for personal gain,” going against the organization’s charter to ensure that AI “benefits all of humanity.”
Deep Dive: “We Need a Manhattan Project for AI Safety”
Samuel Hammond, a senior economist at the Foundation for American Innovation, a tech policy think-tank, wrote an op-ed in Politico calling for a “Manhattan Project for AI Safety.”
The rationale for this project is that because the ‘alignment problem’ of making sure that AI systems do exactly what we want them to do is not yet solved, AI may eventually pose existential risks to humanity as systems get more powerful. Hammond thus wants the government “to fund a research project on the scale it deserves” for such a massive problem, comparing it to the Manhattan Project undertaken to develop nuclear weapons in the early 1940s.
He sees such a project as having a few core functions, including:
Pull “together the leadership of the top AI companies — OpenAI and its chief competitors, Anthropic and Google DeepMind — to disclose their plans in confidence, develop shared safety protocols and forestall the present arms-race dynamic”
“Accelerate the construction of government-owned data centers managed under the highest security, including an “air gap” to ensure that “future, more powerful AIs are unable to escape onto the open internet.”
“Require models that pose safety risks to be trained and extensively tested in secure facilities.”
“Provide public testbeds for academic researchers and other external scientists to study the innards of large models like GPT-4”
There are obstacles to such a project. The article claims that “the research agenda is clear”. For current AI risks, there is a somewhat visible path. For AI accidents with potentially catastrophic consequences, we have no clue. We currently have no clear idea of what would need to be done to solve alignment. This is a key difference with the original Manhattan Project, where the fundamental scientific discoveries were made, the paradigm on which research evolved was clear, and efforts could be usefully targeted in specific directions. All in all, however, increasing public funding for AI alignment research would be helpful, especially since private AI labs don’t necessarily have the incentives to focus on safety techniques.
Hammond recognizes that “the goal is exactly the opposite of the first Manhattan Project, which opened the door to previously unimaginable destruction. This time, the goal must be to prevent unimaginable destruction.” As others have argued, more productive inspirations could be a CERN or the ISS, which unlike the Manhattan project and like Hammond’s idea, were public and open projects. Policy proposals to implement this have already been formulated, for example by Stanford University's Institute for Human-Centered AI.
For corporations, safety research may eventually become too valuable to be widely shared (because ensuring they are safe may become a necessary condition of releasing advanced systems, thus becoming a competitive/commercial advantage). Having a publicly-funded international project would help maintain such research open (at least to members of such a project).
What We’re Reading
How Do OpenAI’s Efforts To Make GPT-4 “Safer” Stack Up Against The NIST AI Risk Management Framework? (Federation of American Scientists)
AI Accidents: An Emerging Threat - What Could Happen and What to Do (Center for Security & Emerging Technology)
Reconciling the U.S. Approach to AI (Carnegie Endowment)
The Greatest AI Risk Could Be Opportunities Missed (Centre for International Governance Innovation)
Testimony of Jason Matheny, CEO of the RAND Corporation and Commissioner of the National Security Commission on Artificial Intelligence to the US Senate Subcommittee on Cybersecurity about the state of AI.
ChatGPT and China: How to think about Large Language Models and the generative AI race (The China Project)
Threats by artificial intelligence to human health and human existence (BMJ Global Health)
The Geopolitics of Technical Standardization: Comparing US and EU Approaches (German Council on Foreign Relations)
Harms from Increasingly Agentic Algorithmic Systems (FAccT 2023)
Reclaiming the Digital Commons: A Public Data Trust for Training Data (Chan et al.)
That’s a wrap for this third edition. You can share it using this link. Thanks a lot for reading us!
— Siméon, Henry, & Charles.
(If you want to meet us, you can book a 15-minute call with us right here.)