Anna Tsvetkov

Anna
Tsvetkov

I am a Philosophy PhD and Computer Science (AI/ML) ScM student at Brown. I work with Adam Pautz, Ellie Pavlick, Chris Hill, and David Chalmers.

I am also an AI researcher at MIT FutureTech. In the Fall, I will be an AI Postdoctoral Research Fellow at Princeton.

My research focuses on human-centered AI. The goal of my work is to use philosophy to guide experiments on AI that deepen our understanding of the mind and, in turn, to use insights about the mind to build explainable and ethical AI.

You can email me at anna_tsvetkov [at] brown.edu.

News

May 2026

Pleased to share that I will be speaking at the Bilkent-UNAM Philosophy of Mind Conference.

Publications

Can We Interpret Artificial Neural Networks As Having Beliefs and Desires?
Anna Tsvetkov
(under review)
PDF • Abstract

Can we interpret the internal workings of artificial neural networks in terms of beliefs and desires? A central aim in mechanistic interpretability is to explain the inner workings of artificial neural networks in terms that we can understand. Since we explain human beings in terms of beliefs and desires, it is natural to ask whether we can explain artificial neural networks in these terms too. In recent work, David Chalmers (2025) proposes propositional interpretability as an important approach within mechanistic interpretability, arguing that the computational states of artificial neural networks can be explained in terms of propositional attitudes—states like beliefs, desires, and subjective probabilities to propositions. This paper examines the prospects for propositional interpretability as a framework for explaining the internal workings of artificial neural networks. I draw attention to a number of empirical and methodological problems with propositional interpretability that are based on a philosophical gloss of recent f indings in the mechanistic interpretability literature. Some of these challenges, such as determining whether artificial neural networks encode propositions rather than unbound concepts, might be mitigated through new interpretability techniques and open up exciting avenues for future research. Other problems, including reliably mapping computational states to propositional attitudes and managing an explosion of potential propositional interpretations, present serious difficulties for the approach. The challenges should be of interest both to philosophers seeking to engage with empirical work in mechanistic interpretability and to neural network researchers aiming to develop novel interpretability methods informed by philosophical insights.

Works in Progress

Can Large Language Models Represent Perceptual Reality?
Molyneux’s Question and Multimodal Models
Scientific Use of Foundation Models and Democratization of AI
(In collaboration with MIT FutureTech)

AnnaTsvetkov

News

Publications

Works in Progress

Anna
Tsvetkov