“Membership Inference Attacks and Contextual Integrity for Language”

Location: 503 Conference room, 177 Huntington Ave.

Abstract: When discussing the privacy implications of chatbots and large language models, I’m often asked, “Do people really share that type of information with models?!” In this talk, I will first demonstrate that users indeed share abundant personal information with ChatGPT [COLM24a]. Motivated by this, I will highlight the spectrum of potential privacy violations in the context of LLMs, beginning with classical membership inference attacks [EMNLP22,COLM24b], which focus on the information leakage from the training data. I will then explore how new LLM use-cases introduce risks beyond training data memorization, including inference-time vulnerabilities. I will explain how information can flow from the model’s input context to its output, and how we adapt the theory of contextual integrity to evaluate this issue [ICLR24]. Finally, I will examine the potential information leakage introduced by synthetic data [Preprint]. To conclude, I will discuss future research directions for dynamically evaluating models for privacy risks and providing improved mitigations that go beyond surface-level manipulations.

Bio: Niloofar Mireshghallah is a post-doctoral scholar at the Paul G. Allen Center for Computer Science & Engineering at University of Washington. She received her Ph.D. from the CSE department of UC San Diego in 2023. Her research interests are Trustworthy Machine Learning and Natural Language Processing. She is a recipient of the National Center for Women & IT (NCWIT) Collegiate award in 2020 for her work on privacy-preserving inference, a finalist of the Qualcomm Innovation Fellowship in 2021 and a recipient of the 2022 Rising star in Adversarial ML award.