We aim to stay on top of cutting-edge research in AI Safety Evaluations and develop a thoughtful community of critical thinkers eager to apply their skills to AI Safety. Sign up to attend using the links in the Event Link column in the Schedule table below. In addition to participating as an attendee, you can suggest a paper for us to cover here. If you’d like to take even more initiative, you can volunteer to present a paper using this form and we will get back to you.

<aside> <img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" />

If you’re new here and wondering what this is all about, check out our guide “How to Eval” where we explain what an eval is, how to get the most from the reading group, and more!

</aside>

Schedule

Date Paper
7 October https://arxiv.org/abs/2507.20526
14 October https://alignment.anthropic.com/2025/automated-auditing/
21 October Evidence for Limited Metacognition in LLMs
28 October https://arxiv.org/abs/2507.23701
4 November AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents
11 November Agentic Reinforcement Learning for Search is Unsafe
18 November TBC
25 November TBC
2 December TBC

FAQ

Attendance and Comms

Discussion Norms

Papers and Presenting

Presentation Archive

Here are some of the papers we’ve read in the past:

Date Paper Slides/Recording
2 September 2025 https://arxiv.org/abs/2406.07358
9 September 2025 https://arxiv.org/abs/2503.08679
16 September 2025 https://metr.org/blog/2025-08-08-cot-may-be-highly-informative-despite-unfaithfulness/
23 September 2025 https://arxiv.org/abs/2507.05246
30 September 2025 https://arxiv.org/abs/2402.07510