Maarten Sap

I am an assistant professor at CMU's LTI department with a courtesy appointment in HCII, and a part-time research scientist and AI safety lead at the Allen Institute for AI (AI2). My research focuses on (1) measuring and improving AI systems' social and interactional intelligence, (2) assessing and combatting social inequality, safety risks, and socio-cultural biases in human- or AI-generated language, and (3) building narrative language technologies for prosocial outcomes.

I received my PhD from the University of Washington where I was advised by Noah Smith and Yejin Choi.
[bio for talks]

Recent updates:

August 2025 🌟: Incredibly honored to be one of 7 US recipients of the 2025 Okawa Research Grant from the Okawa Foundation!

August 2025 πŸ§‘β€πŸŽ“: Welcoming my first postdoc, Vasudha Varadarajan, to the lab!

August 2025 πŸ‘¨πŸΌβ€πŸ«: Excited to give a (virtual) talk about Responsible AI for Diverse Users and Cultures at the Gender Bias in NLP workshop at ACL 2025!

July 2025 πŸ§ πŸ›‘οΈ: Five papers were accepted to COLM 2025! Highlights include HAICOSYSTEM, a framework for sandboxing safety risks in human-AI interaction; ALFA, which aligns LLMs to ask better clinical questions; and PolyGuard, a multilingual moderation tool for unsafe content. Two other papers to be released soon :)

May 2025 πŸ§‘β€πŸ’»πŸ†: Super super excited to announce that our paper Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance received the Best Paper Runner Up award at NAACL 2025. Huge congratulations to Kaitlyn!

April 2025 πŸœοΈπŸš‚: Though I will not be attending NAACL 2025, my students and collaborators will be presenting some exciting papers: Joel Mire on Rejected Dialects: Biases Against African American Language in Reward Models, Akhila Yerukola on NormAd: A Framework for Measuring the Cultural Adaptability of Large Language Models; Kaitlyn Zhou on Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance; Xuhui Zhou on AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents.

April 2025 πŸ¦žπŸ‘¨πŸΌβ€πŸ«: Excited to give a talk at the MIT CSAIL NLP seminar on the challenges of socially aware and culturally adaptable LLMs.

[older news]


My research group:

Dan Chechelnitsky

LTI PhD student
co-advised with Chrysoula Zerva

Joel Mire

LTI MLT student

Karina Halevy

LTI PhD student
co-advised with Mona Diab

Jimin Mun

LTI PhD student

Jocelyn Shen

MIT PhD student
co-advised with Cynthia Breazeal

Vasudha Varadarajan

LTI Postdoc

Akhila Yerukola

LTI PhD student

Mingqian Zheng

LTI PhD student
co-advised with Carolyn RosΓ©

Xuhui Zhou

LTI PhD student


Overarching Research Themes

Themes extracted and images generated with the OpenAI API; there may be inconsistencies.

Ethical AI and Human-Centered Design

My research group explores the implications of ethical considerations in AI systems, focusing on how these technologies are perceived and interacted with by users. A key paper, ["Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences"](https://arxiv.org/abs/2506.00195), investigates the influence of safeguard features on user trust and decision-making processes. Additionally, we delve into cultural responsiveness in AI through the paper ["Mind the Gesture: Evaluating AI Sensitivity to Culturally Offensive Non-Verbal Gestures"](https://arxiv.org/abs/2502.17710), highlighting the need for models to acknowledge diverse social norms. Lastly, our work includes exploring safe human-AI interactions with the design of systems like ["HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions"](http://arxiv.org/abs/2409.16427).

Social Intelligence in AI Agents

My research group explores the development and evaluation of social intelligence in AI agents, specifically how they can effectively interact in complex social environments. The paper ["SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents"](https://arxiv.org/abs/2310.11667) presents a novel framework for assessing social reasoning capabilities in AI. We also investigate the potential of AI in diverse personality simulations through ["BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data"](http://arxiv.org/abs/2410.16491), which enhances how these models can mimic human traits in dialogue. Furthermore, our examination of social reasoning boundaries is captured in ["Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models"](https://arxiv.org/abs/2305.14763).

Exploring Narrative Dynamics

My research group explores the intricate relationships between narratives and human experiences as mediated by AI. Our investigation includes the paper ["HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs"](https://arxiv.org/abs/2405.17633), focusing on how narratives can evoke emotional responses and empathy. We also look into variations in narrative interpretation through ["The Empirical Variability of Narrative Perceptions of Social Media Texts"](https://aclanthology.org/2024.emnlp-main.1113/), which analyzes how context impacts storytelling. Lastly, the work ["Modeling Empathic Similarity in Personal Narratives"](https://arxiv.org/abs/2305.14246) provides insights into how AI can understand and replicate emotional nuances in storytelling.

Addressing Toxicity and Bias in Linguistic Models

My research group explores methods to mitigate bias and toxicity in AI-generated language, emphasizing both ethical implications and practical applications. A critical paper ["PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models"](https://arxiv.org/abs/2405.09373) proposes a framework for assessing toxic language generation across multiple languages. We further examine communicative strategies in combating hate speech through ["Counterspeakers’ Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate"](https://arxiv.org/abs/2403.00179), shedding light on user experiences and expectations. Moreover, our research highlights the implications of linguistic bias in machine learning with the paper ["Rejected Dialects: Biases Against African American Language in Reward Models"](https://arxiv.org/abs/2502.12858).