Anna Soligo

Hey, I'm Anna. I'm currently a MATS scholar with Neel Nanda, and also doing a PhD at Imperial College London, supervised by David Boyle. I'm interested in interpretability and what this can teach us about how language models work and how we can make them safer.

Before that, I studied Design Engineering, had a job designing and building experimental tilt-wing aircraft, and started a clean-tech company aimed at reducing household energy consumption. Outside research, I climb rocks and mountains.

News

July 2025 I'm helping to organise the NeurIPS 2025 Workshop on Actionable Interpretability. We're looking for reviewers: please volunteer here!
June 2025 We had two papers accepted to ICML workshops. More info in the research section below!
June 2025 Our work on emergent misalignment was featured in MIT Tech Review, alongside OpenAI's recent paper.
May 2025 Our paper on "Inducing, Detecting and Characterising Neural Modules" was accepted to ICML.

Research

[arXiv, blog-post] June 2025 Convergent Linear Representations of Emergent Misalignment. Anna Soligo*, Edward Turner*, Senthooran Rajamanoharan, Neel Nanda. ICML 2025 Workshop on Actionable Interpretability
[arXiv, blog-post] June 2025 Model Organisms for Emergent Misalignment. Edward Turner*, Anna Soligo*, Mia Taylor, Senthooran Rajamanoharan, Neel Nanda. ICML 2025 Workshop on Reliable and Responsible Foundation Models
[arXiv] May 2025 Inducing, Detecting and Characterising Neural Modules: A Pipeline for Functional Interpretability in Reinforcement Learning. Anna Soligo, Pietro Ferraro, David Boyle. Proceedings of the 42nd International Conference on Machine Learning (ICML 2025)