We aim to stay on top of cutting-edge research in AI Safety Evaluations and develop a thoughtful community of critical thinkers eager to apply their skills to AI Safety. Sign up to attend using the links in the Event Link column in the Schedule table below. In addition to participating as an attendee, you can suggest a paper for us to cover here. If you’d like to take even more initiative, you can volunteer to present a paper using this form and we will get back to you.
<aside> <img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" />
If you’re new here and wondering what this is all about, check out our guide “How to Eval” where we explain what an eval is, how to get the most from the reading group, and more!
</aside>
| Date | Paper |
|---|---|
| 4 November | AgentMisalignment: Measuring the Propensity for Misaligned Behaviour in LLM-Based Agents |
| 11 November | Agentic Reinforcement Learning for Search is Unsafe |
| 18 November | Lorenzo Pacchardi presents his PredictaBoard: Benchmarking LLM Score Predictability |
| 25 November | No meeting 🦃 |
| 2 December | Hyunwoo Kim presents his Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models |
| 9 December | Kanishk Gandhi presents his Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs |
| 16 December | TBC |
| 23 December | No meeting 🎅 |
| 30 December | No meeting 🎊 |
Attendance and Comms
Discussion Norms
Papers and Presenting
Here are some of the papers we’ve read in the past: