My work focuses on mechanistic interpretability of large language models and AI safety. I'm interested in understanding the internal representations and algorithms that emerge in transformer-based models.