Navigating AI Risks
Subscribe
Sign in
Home
AI Policy Proposals
Governance Matters
SaferAI Roundup
Archive
About
Latest
Top
Discussions
Discontinuing the SaferAI Roundup
Dear reader,
Dec 18, 2024
2
Share this post
Navigating AI Risks
Discontinuing the SaferAI Roundup
Copy link
Facebook
Email
Notes
More
November 2024
The SaferAI Roundup #8: Unlocking AI Reasoning Through Test-Time Compute
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning & OpenAI o1 System Card
Nov 20, 2024
2
Share this post
Navigating AI Risks
The SaferAI Roundup #8: Unlocking AI Reasoning Through Test-Time Compute
Copy link
Facebook
Email
Notes
More
October 2024
The SaferAI Roundup #7: Towards AI Automated Science
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery & Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100…
Oct 8, 2024
Share this post
Navigating AI Risks
The SaferAI Roundup #7: Towards AI Automated Science
Copy link
Facebook
Email
Notes
More
September 2024
The SaferAI Roundup #6: The Implications of AI for Cybersecurity
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities & CYBER SECEVAL3
Sep 24, 2024
1
Share this post
Navigating AI Risks
The SaferAI Roundup #6: The Implications of AI for Cybersecurity
Copy link
Facebook
Email
Notes
More
The SaferAI Roundup #5: Attempts to Benchmark and Solve Jailbreaks.
HarmBench & Improving Alignment and Robustness with Circuit Breakers
Sep 10, 2024
1
Share this post
Navigating AI Risks
The SaferAI Roundup #5: Attempts to Benchmark and Solve Jailbreaks.
Copy link
Facebook
Email
Notes
More
August 2024
The SaferAI Roundup #4: Capabilities Improvement and Safety Testing of GPT-4o and Claude 3.5
GPT-4o System Card & Claude 3.5 Sonnet Model Card Addendum
Aug 28, 2024
2
Share this post
Navigating AI Risks
The SaferAI Roundup #4: Capabilities Improvement and Safety Testing of GPT-4o and Claude 3.5
Copy link
Facebook
Email
Notes
More
The SaferAI Roundup #3: Technical Efforts to Make Safe Open Model Weights Possible
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models & Tamper-Resistant Safeguards for Open-Weight LLMs
Aug 13, 2024
2
Share this post
Navigating AI Risks
The SaferAI Roundup #3: Technical Efforts to Make Safe Open Model Weights Possible
Copy link
Facebook
Email
Notes
More
July 2024
The SaferAI Roundup #2
Observational Scaling Laws and the Predictability of Language Model Performance & Lessons from the Trenches on Reproducible Evaluation of Language…
Jul 30, 2024
Share this post
Navigating AI Risks
The SaferAI Roundup #2
Copy link
Facebook
Email
Notes
More
The SaferAI Roundup #1
Welcome to "The SaferAI Roundup", our new format.
Jul 16, 2024
4
Share this post
Navigating AI Risks
The SaferAI Roundup #1
Copy link
Facebook
Email
Notes
More
December 2023
#16 - A Democratic "Cautious Coalition": What Grand Strategy for AI Safety? + Sycophancy
“Expected 1 unit of progress, got 2, remaining 998.” Eliezer Yudkowsky, writer and researcher, reacting to a positive discovery in AI interpretability…
Dec 7, 2023
2
Share this post
Navigating AI Risks
#16 - A Democratic "Cautious Coalition": What Grand Strategy for AI Safety? + Sycophancy
Copy link
Facebook
Email
Notes
More
November 2023
#15 - Altman, Jinping, Biden, and O
In this week’s newsletter: The OpenAI Debacle, Governance of AI with Chinese Characteristics, The White House Tightens AI Oversight, and the EU's AI Act
Nov 22, 2023
Share this post
Navigating AI Risks
#15 - Altman, Jinping, Biden, and O
Copy link
Facebook
Email
Notes
More
#14 - Day 1 of Global AI Safety Governance
Last week, 28 countries met in Bletchley Park for the world’s first AI Safety Summit.
Nov 8, 2023
2
Share this post
Navigating AI Risks
#14 - Day 1 of Global AI Safety Governance
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts