Bilal Chughtai

I am currently a visiting researcher at Northeastern University, working with Professor David Bau. I do ML and AI Safety research. My research interests are in better understanding how powerful general AI systems function, and leveraging such understanding to create safer systems. I am interested in (mechanistic) interpretability, capability evaluations, AI alignment and AI safety.

Previously, I was a scholar in the MATS program, working on mechanistic interpretability with Neel Nanda. Before that, I was an MLSS scholar. I have an MMath in mathematics from the University of Cambridge, where I was primarily interested in theoretical physics.

Publications

Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs; Bilal Chughtai, Alan Cooney, Neel Nanda. NeurIPS 2023 Attributing Model Behaviour at Scale workshop. arXiv

Language Models Struggle To Explain Themselves; Dane Sherburn, Bilal Chughtai, Owain Evans. Under Review.

A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations; Bilal Chughtai, Lawrence Chan, Neel Nanda. ICML 2023. ICLR 2023 Physics4ML workshop (spotlight). arXiv. Poster. Slides. Demo. Code.

A more up to date list of publications may be found on my Google Scholar.

CV

You can find my CV here.

Other Projects

The Search for CMB B-mode Polarisation from Inflationary Gravitational Waves; Bilal Chughtai. Part III Essay, Mathematical Tripos, 2021. Essay.

CUPLC Website; Bilal Chughtai, 2021. Website.

CUPLC Records System; Bilal Chughtai, 2021. Website. Code.

Other interests

You can see what I’m listening on lastfm. I am a competitive powerlifter, and was previously webmaster of the Cambridge University Powerlifting Club.