Discover more from Navigating AI Risks
#7: Binding AI Treaty by 2024 + The UK Takes the Lead + A Constitution for AI?
Welcome to Navigating AI Risks, where we explore how to govern the risks posed by transformative artificial intelligence.
This is the 7th edition of NAIR. We want to make this as useful as possible, so feel free to send us your feedback about what we can improve.
Let’s dive in!
In the Loop
The Power Struggle Behind the World’s First AI Treaty
The world’s first binding treaty on artificial intelligence is currently being negotiated. The Council of Europe, an international organization founded after World War II to defend human rights, is hosting the discussions.
Why is this important? An international agreement is useful because it binds states to its provisions. Violating international law significantly damages a state's reputation and undermines its credibility in upholding future agreements (a very valuable asset for all countries). In short, if a country signs a treaty, it will apply it the vast majority of the time. That’s how getting useful rules into the currently negotiated AI treaty could push beneficial AI governance forward.
For more than a year, the Council of Europe’s 46 member states, and states like Japan, the United States and Canada have been going back and forth, with guidance from the body’s Committee on Artificial Intelligence. According to its Chair, “you need to be a reliable partner in terms of the clarity and the predictability of your legal system” to have a sit at the table. In other words, countries like China, Russia, and most autocracies won’t participate.
Last year, the US obtained a concession from other countries: that negotiating positions stay between states, for fear that its point of view on certain contentious topics be revealed to (previously) participating civil society organizations. This prompted fears of weaker treaty obligations and reduced accountability . Now, non-state actors can only participate by sending comments during the Committee’s plenary meetings, the next one being in October.
Another US priority is to completely change the scope of the treaty. The current version of the draft (available here) states that the treaty will apply to AI systems “regardless of whether these activities are undertaken by public or private actors”. But ongoing diplomatic efforts led by the United States (backed by Canada, Israel, and the United Kingdom) would make the global rules applicable to public authorities only.
Article 15, for example, mandates the creation of “oversight mechanisms as well as transparency and auditability requirements”. It wouldn’t apply to private companies, who would be most impacted by such requirements. Same thing for article 16, laying out a “Principle of Safety” according to which “each Party shall ensure that adequate safety, security [...] and robustness requirements are in place.”
This change of scope would make the treaty inapplicable to the overwhelming majority of current AI development, which occurs in the private sector. The US position will arguably likely pass, despite opposition by many other European states and the Council of Europe's own secretariat. As reported by Euractiv, that’s in part because the Council of Europe, which handles the negotiations, expects to receive funding from the US to promote the treaty. The international organization wants to ensure its support.
Under the US proposal, countries would be free to choose to also apply the rules of the treaty to the private sector. But of course, they don't need a treaty to do that.
This is an important story and a work in progress. The treaty will probably be signed by mid-2024. Until then, a lot can change.
The United Kingdom's AI Diplomacy
After their meeting on June 8, US president Joe Biden and UK Prime Minister Rishi Sunak announced the ‘Atlantic Declaration’, a document outlining their priorities for the US-UK relationship. The document articulates their ambition to strengthen the countries’ “lead in the technologies of the future” and “advance the closest possible coordination on our economic security and technology protection toolkits”.
This comes after a flurry of diplomatic activity on AI. At the May meeting of the G7 (which brings together the world's wealthiest democracies), Sunak also pushed for discussions on AI, now launched under the so-called "Hiroshima process." The EU and US also regularly discuss AI at the Trade and Technology Council (NAIR #6).
The joint US-UK statement also included some details about the upcoming global summit on AI safety, now planned for December. The US commits to “attend at a high level”. The summit will seek to “drive targeted, rapid international action focused on safety and security at the frontier of this technology, including exploring safety measures to evaluate and monitor risks from AI.”
After a discussion with OpenAI's Sam Altman in May, Sunak seemingly was sold on the idea of an IAEA for AI, now also backed by the UN's secretary-general. Though the creation of an international AI agency will surely be a topic of discussion during the summit, concrete negotiations probably won’t start then (even if, until December 2023, a lot can change). As many have noted, such an organization will be hard to launch.
On June 7, a press release from the Prime Minister’s office stressed the need to “consider the risks of AI, including frontier systems, and discuss how they can be mitigated through internationally coordinated action.” Discussions, however, will take place among "like-minded" countries. That means mostly democracies. There’s little chance that Chinese representatives will attend, despite calls for US-China coordination on AI policies (NAIR #2).
The UK foundation models Taskforce, launched with $100M of funding for AI safety, will contribute to the preparation of the summit. The taskforce is led by Ian Hogarth, notorious AI investor and author of a Financial Times article entitled “We must slow down the race to God-like AI”.
At the moment, most of the preparation for the summit seems to be focused on involving private companies. For Seán Ó hÉigeartaigh, director of a research project on AI futures at the university of Cambridge, the summit would benefit from the UK’s “leading expertise on global risk, concrete present-day harms, and AI governance” coming from academia and civil society.
The old guard is also taking steps to assert the UK’s leadership in the field. Former Prime Minister Tony Blair and former foreign minister William Hague released a widely-quoted report. Among other recommendations, they call for a tenfold increase in the country’s compute capacity and for the creation of a national laboratory for safe AI development, suppoed to feed into the efforts of a potential international AI regulatory agency. They also propose to implement:
“a tiered-access approach to compute provision under which access to larger amounts of compute comes with additional requirements to demonstrate responsible use” (a particularly relevant proposal for transformative AI risks).
Why is the UK making AI safety a global priority? Mostly, to push for a 3rd way between US and EU approaches to AI regulation (or a 4th way, if you count China). The country has come a long way since the white paper outlining its "pro-innovation" approach to AI regulation came out in March.
U.S. Senators Want More from Tech Companies
Senator Blumenthal wrote a letter to Meta (text here). He asks important questions regarding the leak of their model LlaMa, the most powerful open-source model to date.
Blumenthal is the Chairman of the Subcommittee on Privacy, Technology, and the Law, which interrogated OpenAI’s CEO Sam Altman, AI expert Gary Marcus, and IBM’s chief privacy officer Christina Montgomery last month (NAIR #5).
The letter, addressed to Mark Zuckerberg, requests “information on how your company assessed the risk of releasing LLaMA, what steps were taken to prevent the abuse of the model, and how you are updating your policies and practices based on its unrestrained availability.”
This matters because it is an important step towards more responsible information sharing policies regarding frontier AI systems. The US might want to act on these key issues for national security concerns, given that the country’s geopolitical competitors may acquire the dual-use capabilities of these systems if top labs share frontier models with them (or if those are stolen).
From a safety and security perspective, it matters a lot to solve this issue within a year or two, given the level of capabilities that are expected by then. Many experts are expecting that AI may increase the risk of a new global pandemic and make it easier to carry out large-scale cyber attacks. Experts already disagree whether or not GPT-4 has a sufficient level of capabilities to cause extreme risks, because no comprehensive assessment of those capabilities has been done yet.
Senator Blumenthal has announced that he’s “preparing a framework for a new agency, licensing, transparency obligations & more, with broad input, including industry & experts.” That echoes the licensing proposals from several parties, including many of the policy proposals made by both industry and civil society to address extreme risks from AI.
Blumenthal seems to be willing to move relatively fast, maybe putting pressure on his colleagues who were already doing the groundwork and collecting information for an AI bill, e.g. Senators Schumer, Rounds, Heinrich and Young. One key deadline is next years’ election. If nothing gets done by early-mid 2024, it’s likely that nothing will get done before 2025.
EU: The European Parliament adopted its position on the AI Act (NAIR #4). Negotiations with the European Commission and the member states will start soon, with the goal of enacting the regulation by December.
China/Industry: Chinese AI leader SenseTime and the Chinese government's AI lab released InternLM, a large language model. The model is slightly below the level of OpenAI’s models 18 months ago, and on par or better than Meta’s current frontier models (note: Meta is quite far from the frontier on large language models). This is a sign that some Chinese companies are catching up with second-tier American industry.
Industry: 42% of the >100 CEOs surveyed at a Yale-organized roundtable say that AI has the potential to destroy humanity five to ten years from now.
Global: The UN’s Secretary-General Antonio Guterres warns that AI extinction risks are real and should be taken seriously.
US/Industry: According to the White House Chief of Staff, AI companies are working with the Biden administration on voluntary commitments that will be announced soon.
EU: According to European Commissioner Margrethe Vestager, notorious for handing out hefty fines to Big Tech companies, although extinction risks from AI may exist, “the likelihood is quite small”.
Global: A study found that AI chatbots “suggested four potential pandemic pathogens, explained how they can be generated”, and “supplied the names of DNA synthesis companies unlikely to screen orders”.
Southeast Asia: Public officials from the Association of Southeast Asian Nations (ASEAN) are working on a set of guidelines for AI.
Germany: Intel will build a semiconductor factory worth €33bn, roughly double the amount originally planned. This includes €10bn of funding from the German government.
China/US/Industry: Chinese company ByteDance, which owns TikTok, bought 1bn of chips from American company Nvidia in 2023. This comes after Nvidia launched a new chip not covered by the export controls imposed by the US in October 2022.
Explainer: Constitutional AI
Lead Author: Henry Papadatos
To make chatbots or AI assistants more efficient, accurate, and helpful, OpenAI employs a technique called Reinforcement Learning from Human Feedback (RLHF). In this approach, AI models generate multiple responses to a question, allowing human critics to rank them. The AI model then learns from this feedback and updates itself to favor higher-ranked answers in the future. However, this method has limitations, such as not scaling well due to the extensive human resources required.
To try to rely a on less human feedback, Anthropic, the company behind the chatbot Claude, developed an adjacent technique called Reinforcement Learning with AI-generated Feedback (RLAIF), also known as Constitutional AI (which is a remarkable communication coup!). Instead of relying on human feedback, Constitutional AI uses a set of rules or a "constitution" to evaluate the AI system's outputs with other AI systems. These rules act as guidelines for good behavior and help the AI generate more appropriate responses.
Here's how Constitutional AI works in two steps:
The AI model is first trained to assess and revise its responses based on the rules outlined in the constitution.
Next, the AI model undergoes reinforcement learning, using AI-generated feedback to rank answers rather than human evaluations. Subsequently, the AI updates itself to favor higher-ranked answers, much like the RLHF method.
Even with this method, human feedback is very much needed to decide whether or not a given constitution gives good results or not. So in practice there’s not a large difference between the two methods. It’s also important to note that a paper from OpenAI published a year before called Self-critiquing models for assisting human evaluators laid some of the important foundations of this work.
The constitution used in Constitutional AI originates from a variety of sources, such as the UN Declaration of Human Rights, Apple terms of services, trust and safety best practices, and AI principles proposed by other research labs. The developers also try to incorporate non-western perspectives when it doesn’t conflict with core western views. It is important to note that the principles within the constitution are not final, and researchers continuously refine them based on feedback and new knowledge.
As AI systems continue to evolve, widespread societal processes for creating AI constitutions may develop, ensuring that AI models generate more helpful, accurate, and safe responses in various applications.
What We’re Reading
How a Chinese Company Censors its Own Answers to ChatGPT (China Digital Times)
Building a Systems-Oriented Approach to Technology and National Security Policy (Center for Security and Emerging Technology)
A New National Purpose: AI Promises a World-Leading Future of Britain (Tony Blair Institute for Global Change)
TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI (Andrew Critch & Stuart Russell)
The Path to Trustworthy AI: G7 Outcomes and Implications for Global AI Governance (Center for Strategic and International Studies)
Looking before we leap: Expanding ethical review processes for AI and data science research (Petermann et al.; summary by the Montreal AI Ethics Institute)
A Playbook for AI Risk Reduction (focused on misaligned AI) (Holden Karnofsky)
Harms from Increasingly Agentic Algorithmic Systems (Chan et al.)
Do Foundation Model Providers Comply with the EU AI Act? (Stanford Center for Research on Foundation Models)
China’s New Strategy for Waging the Microchip Tech War (Center for Strategic and International Studies)
That’s a wrap for this 7th edition. You can share it using this link. Thanks a lot for reading us!
— Siméon, Henry, & Charles.
Be at the forefront of AI governance. Subscribe below and never miss one edition.