We aim to stay on top of cutting-edge research in AI Safety Evaluations and develop a thoughtful community of critical thinkers eager to apply their skills to AI Safety. Sign up to attend using the links in the Event Link column in the Schedule table below. In addition to participating as an attendee, you can suggest a paper for us to cover here. If you’d like to take even more initiative, you can volunteer to present a paper using this form and we will get back to you.

<aside> <img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" />

If you’re new here and wondering what this is all about, check out our guide “How to Eval” where we explain what an eval is, how to get the most from the reading group, and more!

</aside>

Schedule

Date Paper
4 November AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents
11 November Agentic Reinforcement Learning for Search is Unsafe
18 November Lorenzo Pacchardi presents his PredictaBoard: Benchmarking LLM Score Predictability
25 November No meeting 🦃
2 December Hyunwoo Kim presents his Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
9 December Kanishk Gandhi presents his Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
16 December TBC
23 December No meeting 🎅
30 December No meeting 🎊

FAQ

Attendance and Comms

Discussion Norms

Papers and Presenting

Presentation Archive

Here are some of the papers we’ve read in the past:

Date Paper Slides/Recording
2 September https://arxiv.org/abs/2406.07358
9 September https://arxiv.org/abs/2503.08679
16 September https://metr.org/blog/2025-08-08-cot-may-be-highly-informative-despite-unfaithfulness/
23 September https://arxiv.org/abs/2507.05246
30 September https://arxiv.org/abs/2402.07510
7 October https://arxiv.org/abs/2507.20526
14 October https://alignment.anthropic.com/2025/automated-auditing/
21 October Evidence for Limited Metacognition in LLMs
28 October https://arxiv.org/abs/2507.23701