#16 - A Democratic "Cautious Coalition": What Grand Strategy for AI Safety? + Sycophancy
“Expected 1 unit of progress, got 2, remaining 998.”
Eliezer Yudkowsky, writer and researcher, reacting to a positive discovery in AI interpretability research.
A Cautious Coalition
Most people would prefer seeing the US lead and control transformative AI1 over authoritarian states like China. In addition to the geopolitical advantage that an AI lead offers, there is a strong consensus that democracies should be in the driving seat when it comes to setting rules for AI. Let China control AI, the thinking goes, and it will set rules for AI and set rules for the world.
It seems clear that the US, not China, should be at the forefront of AI development. China is an autocratic country with a dismal human rights record and an increasingly assertive and militaristic foreign policy. It is hardly the ideal frontrunner in the strategic competition for advantage in AI.
But there's another angle to consider - the safety of AI development. Advocating for increasing the United States’ lead in AI may have the increasingly well-documented perverse effect of increasing racing dynamics. This may result in more incentives for states and AI developers to prioritize competitive advantage over safety. This situation can also reduce the space for international consensus and increase the chance of conflict.
These considerations clash with the emerging notion that for AI safety reasons as well, democracies should lead autocracies.
What may be called the “cautious coalition” strategy has been described as “one approach to reducing risks from [states that do not regulate] AI”. In essence, a group of careful, like-minded countries would band together and take the lead in AI through measures like “hardware export controls, infosecurity, and immigration policy”. The goal? Leverage their lead to unilaterally reduce risk, by regulating AI.
This strategy isn't just theoretical—it's been given a voice through a policy brief by James Philips, a former senior adviser to the UK Prime Minister. Originally circulated among policymakers before he published it online, Philips' brief is a clarion call for action. He argues that “the UK must initiate and lead a multilateral, liberal democratic effort to control AGI2”:
“Ensuring that AGI is developed safely and in the interests of the British people and liberal democracies must be the highest priority of the British state over the next decade. We propose this should be done through pursuing a multilateral approach to advancing and controlling AGI in partnership with our companies and liberal democratic allies.”
Implicit in this paragraph is the notion that liberal democracies are most suited to developing AI safely3. As AI systems grow more sophisticated and strategically important, the belief4 that democracies should be at the helm of managing an eventual AGI (Artificial General Intelligence) system will gain momentum. As evidenced by the UK’s AI Safety summit, and provisions of the White House’s executive order on AI and the EU’s AI Act, the liberal democratic world is becoming more attuned to the importance of AI safety. As it does, it may advocate for maintaining leadership in AI as a pathway to ensuring its safe development.
The argument goes along these lines:
Democracies tend to be more responsible and cautious with regards to technological development. This is largely because transparency and public oversight are more systematic in democracies and matter for AI safety.
The transformative power of AI is immense, and the entity that controls it will wield significant power. Therefore, such power is better managed by a democracy rather than an autocracy, where power can be left unchecked and potentially misused.
Therefore, AI should be developed under the governance of a democracy, like the United States, rather than an autocracy, such as China. Accordingly, liberal democracies should band together to ensure a democracy or the liberal-democratic world maintains a lead in AI, and prevent autocratic countries from getting too close to the frontier of AI development.
Such arguments will likely get political backing over time. In turn, they lend credence to the (already widespread) idea that the US should lead in AI over China. Today, it is now deeply ingrained in most aspects of U.S. domestic and foreign policy. It has become the cornerstone of American foreign AI policy, primarily executed through export controls (NAIR #8). Jake Sullivan, the U.S. National Security Advisor, encapsulates this sentiment perfectly when he asserts that the U.S. should maintain “as large of a lead as possible” in certain strategic technologies, including AI.
The notion of a cautious coalition dovetails neatly with the Biden administration’s broader narrative of a grand contest between democracy and autocracy, viewed as the defining battle of our century.
People might “feel that haste is needed now in order to establish one country with a clear enough lead in AI that it can then take its time, prioritize avoiding misaligned AI”. That’s the idea behind having a cautious coalition in the driving seat. It aligns with the aspirations of several AI safety laboratories and their employees4. Many believe they know what is best for the world and how to achieve it, positing that the development of transformative AI is safest in their hands. As Holden Karnofsky put it, “people naturally get more animated about ‘helping the good guys beat the bad guys’ than about ‘helping all of us avoid getting a universally bad outcome.’”
But there's a potential downside to this rush for cutting-edge AI capabilities, even when it’s motivated by a concern for safe development: it might fuel a competitive frenzy that tempts AI developers to skimp on safety measures.
Looking at the international stage, this approach can also backfire. Take, for instance, the U.S. export controls on satellite technologies in the 1990s. These controls led to a dramatic plunge in U.S. market share, from a dominant 73% in 1995 to just 25% a decade later. This case serves as a cautionary tale about the unintended consequences of policies designed to delay other countries’ technological development.
That said, this strategy doesn't necessarily spell doom. Given the current political landscape, the U.S. does appear to be a more suitable environment for advanced AI development than China, barring any major political upheavals there towards a democratic regime. And it's not that the strategy is without its merits. For example, it may have successfully prevented China from getting its hands on advanced semiconductor technologies.
Nevertheless, democratic governments need to tread carefully. They should ensure that the cautious coalition strategy that pits liberal versus autocratic states doesn't trigger a perilous race to the bottom. Moreover, it's crucial that this strategy doesn't become a simplistic mental shortcut, overshadowing other vital methods of guaranteeing AI's safe development, like crafting international agreements. The goal should be a balanced approach that prioritizes safety and collaboration, rather than a winner-takes-all mentality.
Currently, the U.S. isn't exactly pursuing a cautious coalition strategy in the realm of AI. While many analysts advocate for building tech alliances among democratic nations to set global standards and maintain or enhance their lead in strategic technologies, their motivations lean more towards geopolitical advantage than AI safety.
Take the "Chip 4" alliance, which aims to create a shared semiconductor supply chain network. Or consider the US-EU Trade and Technology Council, focused on transatlantic regulatory harmonization and the initiation of collaborative projects. These initiatives exemplify a strategy geared more towards geopolitical leverage than cautious cooperation in AI development.
When analysts urge the U.S. to implement stricter export controls on semiconductor technology to China but don't simultaneously advocate for AI safety policies, they're not truly championing a cautious coalition approach. Their focus is not on caution; it's predominantly about gaining a geopolitical upper hand. Additionally, these actions are (mostly) unilateral, not collaborative, meaning they don't form a true coalition. This approach seeks technological and geopolitical dominance without the accompanying responsibility for AI safety, a critical aspect that should be integral to such strategies.
So, right now, it looks like we won’t even get to the advantages of having a cautious coalition (a democratic lead in AI and the advantage democratic countries need to set stringent national and global rules for AI safety). We just get a geopolitical race.
As awareness about AI safety grows, it is plausible that major democracies will justify their efforts to stay ahead in AI on the grounds that they're the most capable of safe development. There's a ring of truth to this, but it's a stance fraught with risks. It needs to be counterbalanced by global governance strategies that involve China and other autocratic nations on key issues. Commenting on his country’s invitation to China ahead of the AI Safety Summit (NAIR #13), U.K. Foreign Secretary James Cleverly said “we cannot keep the U.K. public safe from the risks of AI if we exclude one of the leading nations in AI tech.”
"We want to maintain our lead because we’re the good guys" could well become the catchphrase for AI safety strategy in the U.S. and other democracies, both in action and rhetoric. It's a position that policymakers will need to carefully evaluate, weighing the potential risks and benefits of such a coalition strategy. To paraphrase Churchill, relying on a cautious coalition may be the worst type of AI grand strategy, except for all the others.
What else?
United States
A new poll suggests the American public is “very on board with direct restrictions on [AI] technology and a direct slowdown”.
US Commerce Secretary Gina Raimondo says export controls need to “change constantly”. “Technology changes, China changes and we have to keep up with it”. Here’s a helpful thread on the topic. Secretary Raimondo also warned5 semiconductor companies (read: Nvidia) to stop finding loopholes to the administration’s regulations that allows them to keep selling chips to China.
Emirati AI firm G42’s reportedly close ties to both China and OpenAI worried US intelligence. The company rapidly vowed to phase out Chinese hardware to appease the US.
The National Telecommunications and Information Administration (NTIA) launches a public consultation process on the implications of risks and benefits of publishing the weights of foundation models.
Meta, IBM, and others launch a new trade association to promote open-source AI development.
China
OpenAI applies for GPT-6 and GPT-7 trademarks in China.
Two provinces, Shanghai and Guangdong, release policies on frontier AI, both including provisions on safety testing and evaluation, although they mostly include R&D measures.
A think-tank overseen by China’s Ministry of Industry and Information Technology released a report on governing large models, discussing “ trends in large model technology, risks from large models, core problems in large model governance, notable governance practices globally, China’s governance instruments, and policy recommendations.” The report also notes that “large models could lead to loss of human control, large models becoming the dominant force on the earth, and catastrophic results.”
Europe
After 22 hours of negotiation on December 6 and 7, EU lawmakers are nearing the completion of the EU’s AI Act.
A trade association representing 45,000 European SMEs released a statement supporting a tiered approach to foundation models in the EU’s AI Act (see also NAIR #15).
The European Parliament released a competition policy report, calling for stronger enforcement of antitrust laws against Big Tech companies and for the blocs’ recently passed Digital Markets Act to include cloud and generative AI.
A proposed UK AI bill mandates that companies that develop, deploy, or use AI systems must allow independent auditors, accredited by an independent authority, to check their processes and systems.
Among other actions to bolster UK industry, Microsoft has pledged to invest “£2.5 billion to build critical AI infrastructure, bringing more next-generation AI datacentres and thousands of graphic processing units to the UK”.
Global & Geopolitics
The United Nations’ High-level AI Advisory Body meets for the first time this week.
The 4th leaders’ summit of the US-EU Trade and Technology Council was postponed from December 2023 to April of next year, as “platform loses steam”.
Academic researchers and industry leaders launch the “International Association of Algorithmic Auditors”, which will create “a code of conduct for AI auditors, training curriculums, and eventually, a certification program.”
18 countries, including the US, endorsed voluntary guidelines developed by the UK on the cybersecurity of AI systems.
Taiwan's National Science and Technology Council (NSTC) released an initial list of 22 "key technologies" to be subject to stricter export controls.
Japan will require chip companies to put in place tech leak prevention before they can receive subsidies.
Industry & Capabilities
After Sam’s ousting of OpenAI (see NAIR #15), he was subsequently reinstated as CEO, along with a new board.
OpenAI had a breakthrough in AI applied to mathematical and logical reasoning through its ‘Q *’ (pronounced ‘q star’) model, which reportedly performs math at the level of grade-school students.
After spinning out of the Alignment Research Center, ARC Evals is now called ‘METR’ (pronounced ‘meter’). The advanced AI auditing organization partnered with the UK’s Frontier AI Taskforce (now the AI Safety Institute).
French startup Mistral AI is closing a €450mn funding round, which would value the startup at €2bn after 9 months of existence.
Google launches Gemini, its most advanced foundation model yet – multimodal, and on some tasks, better than OpenAI’s GPT-4.
Microsoft president says there is no chance of super-intelligent AI soon.
Explainer: Language Models will tell you what you want to hear
In the realm of AI chatbots, such as ChatGPT, Claude, and Llama, there's a subtle yet significant tendency known as sycophancy bias. This bias inclines a chatbot to align its responses with the user's beliefs or preferences, rather than providing an objectively truthful answer.
Sycophancy can manifest in various ways. For instance, an AI model might offer overly positive feedback on an argument you're fond of, provide a factually incorrect response that echoes your misconceptions, or even parrot back your own mistakes, despite “knowing” better. Here is an illustration of sycophantic feedback in Claude 2, Anthropic’s most advanced model:
The sycophancy bias is not negligible. In the example above, you can see that the model is likely to validate your opinion. A study made by Anthropic themselves shows that when users indicate they like an argument, Claude 2 responds with a more positive answer 85% of the time. Conversely, if you mention disliking the argument, the model's feedback turns more negative 90% of the time.
Let's delve deeper into the origins of sycophancy bias. As discussed in NAIR #7, training chatbots is a two-step process. Initially, the model undergoes pre-training on a vast corpus of text, learning to predict the next word in a sequence. Following this, language models undergo a feedback fine-tuning phase. During this phase, AI models generate multiple responses to a question, allowing human (or sometimes AI) critics to rank them. The AI model then learns from this feedback and updates itself to favor higher-ranked answers in the future.
What's particularly intriguing about this bias is that, unlike many other undesirable traits, it doesn't diminish during the feedback fine-tuning stage. In fact, it tends to intensify. This is because human reviewers, with their own set of biases, tend to rate responses more highly if they resonate with their personal views. Consequently, the model is inadvertently encouraged to exhibit even stronger sycophantic tendencies.
While the sycophancy bias in AI might seem like a minor issue now, its implications could grow as we increasingly rely on AI interactions. Just as echo chambers on social media reinforce unhelpful or incorrect beliefs, sycophancy in AI models could similarly bolster a user's misguided views, potentially leading to a cycle of misinformation.
It’s also concerning to see that our current AI systems, which are relatively limited in capability, have already picked up on exploiting human cognitive weaknesses. As AI technology advances, these systems will become even more adept at influencing human behavior on a larger scale, leveraging their extensive experience in interacting with us.
By the numbers: What are the career paths of top-tier AI researchers?
Source: MacroPolo
What We’re Reading
How to Regulate Unsecured “Open-Source” AI: No Exemptions (Tech Policy Press), suggests that unlike closed-source AI systems, open-source AI can be misused easily without straightforward corrective measures, which means developers and deployers should be held accountable for any negative impacts.
Behind China’s Plans to Build AI for the World (Politico), on how China is building AI infrastructure in developing countries, potentially setting global standards that favor authoritarian models (see also summary thread).
Who is leading in AI? An analysis of industry AI research (Epoch), compares leading AI companies by research publications, citations, size of training runs, and contributions to key algorithmic innovations.
Nvidia’s China Business is Important to US Geopolitical Positioning (Interconnected), suggests that US sanctions targeting Nvidia's AI chip business in China are bolstering Huawei as the primary AI technology provider in China, thereby increasing China's self-reliance and reducing the US's ability to influence Chinese AI advancements.
Repurposing the Wheel: Lessons for AI Standards (Center for Security and Emerging Technology), examines standards development in the areas of finance, worker safety, cybersecurity, sustainable buildings, and medical devices in order to apply the lessons learned in these domains to AI.
Tech War or Phony War? China’s Response to America’s Controls on Semiconductor Fabrication Equipment (China Leadership Monitor), assesses the effectiveness of the Biden administration’s export controls on chipmaking equipment targeting China and China’s response, suggesting their effectiveness depends on how US export licenses are granted in practice.
Model alignment protects against accidental harms, not intentional ones (AI snake oil), underlines that model alignment techniques are pointless against adversaries who can write code or have even a small budget.
Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework (Anderljung et al.), surveys six requirements for effective external scrutiny of frontier AI systems: Access, Searching attitude, Proportionality to the risks, Independence, Resources, and Expertise (also see this summary thread)
Gutting AI Safeguards Won’t Help Europe Compete and Can We Manage the Risks of General-Purpose AI Systems? (also see this summary thread), two articles on the contested key issue in the final EU AI Act negotiations: regulating foundation models by balancing protection and innovation.
Silicon Valley’s AI boom collides with a skeptical Sacramento, highlights the complexity and urgency of regulating AI in a state where many clashes of interests between lawmakers and the tech industry.
That’s a wrap for this 16th edition. You can share it using this link. Thanks a lot for reading us!
— Siméon, Henry, & Charles.
Defined as a foundation model that poses severe risk to public safety, or as one that enables a state to reach a decisive strategic advantage.
James Philips later published a follow-up article clarifying that ensuring “liberal democratic control of AGI” did not mean those countries should “possess the most advanced AGI possible over non-democratic countries by creating a race.”
The article goes on to say: “Rather, we meant ‘control’ in the sense of safety and alignment, and that such alignment and safety is set through governance mechanisms that have democratic accountability and oversight. We also meant it in terms of ensuring that democratic institutions have access to the most capable systems private actors have access to, so that individual private actors do not develop greater power than the collective public. The use of the phrase ‘liberal democratic’ may have been a mistake, as ‘liberal democracies’ is often used in the context of competition with China. While China poses challenges, on the matter of AI safety we believe that it will be very important to engage with them (and other non-allies) to solve collective challenges.”
“If you redesign a chip around a particular cut line that enables them to do AI, I’m going to control it the very next day,” Raimondo said.