The first LLMs for voice and emotional intelligence
Toward AI that understands and optimizes for human expression
40+
publications
3000+
citations
1 million+
participants
Speech recognition, understanding, and generation with the same core intelligence
With the first speech-language models, EVI 1 and 2, we pioneered voice AI that understands what it’s saying. Our latest model, Octave, models the multiplex of human personas.
Fine-tuned with scientifically controlled data
Traditional theories posited six discrete emotions, but we’ve discovered that emotional behavior is better explained by a high-dimensional, continuous space

Speech Prosody
Discover over 25 patterns of tune, rhythm, and timbre
Modalities
Speech
Samples
Use this map to explore the outputs of our speech prosody model
Emotions
Amusement, Anger, Awkwardness, Boredom, Calmness, Confusion, Contempt, Desire, Determination, Distress, Fear, Guilt, Horror, Pain, Pride, Sadness, Surprise, and Tiredness
1/3
Voice AI optimized for human preferences
Led by researchers at the intersection of psychology and AI, we run large-scale controlled studies to optimize our models for human preferences. In our most recent evaluation, we found that speech generated by Octave was greatly preferred over the previous state-of-the-art.

Maintaining frontier language capability
Despite its diverse speech processing and generation capabilities, Octave maintains comparable performance on language understanding tasks to a similar-sized frontier LLM. That means that it is well-suited to power AI systems that follow detailed instructions, use tools, or control an interface.

Publications
Discover the research foundational to our products
Sixteen facial expressions occur in similar contexts worldwide
What emotions do the face and body express? Guided by new conceptual and quantitative approaches (Cowen, Elfenbein, Laukka, & Keltner, 2018; Cowen & Keltner, 2017, 2018), we explore the taxonomy of emotion recognized in facial-bodily expression.
What music makes us feel: At least 13 dimensions organize subjective experiences associated with music across different cultures
Emotional vocalizations are central to human social life. Recent studies have documented that people recognize at least 13 emotions in brief vocalizations. This capacity emerges early in development, is preserved in some form across cultures, and informs how people respond emotionally to music.
Self-report captures 27 distinct categories of emotion bridged by continuous gradients
Claims about how reported emotional experiences are geometrically organized within a semantic space have shaped the study of emotion. Using statistical methods to analyze reports of emotional states elicited by 2,185 emotionally evocative short videos with richly varying situational content, we uncovered 27 varieties of reported emotional experience.
Universal facial expressions uncovered in art of the ancient Americas: A computational approach
Central to the study of emotion is evidence concerning its universality, particularly the degree to which emotional expressions are similar across cultures. Here, we present an approach to studying the universality of emotional expression that rules out cultural contact and circumvents potential biases in survey-based methods: A computational analysis of apparent facial expressions portrayed in artwork created by members of cultures isolated from Western civilization.
Mapping the passions: Toward a high-dimensional taxonomy of emotional experience and expression
What would a comprehensive atlas of human emotions include? For 50 years, scientists have sought to map emotion-related experience, expression, physiology, and recognition in terms of the “basic six”—anger, disgust, fear, happiness, sadness, and surprise.
The neural representation of visually evoked emotion Is high-dimensional, categorical, and distributed across transmodal brain regions
Emotional vocalizations are central to human social life. Recent studies have documented that people recognize at least 13 emotions in brief vocalizations. This capacity emerges early in development, is preserved in some form across cultures, and informs how people respond emotionally to music.
GoEmotions: A dataset of fine-grained emotions
Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks.
Facial movements have over twenty dimensions of perceived meaning that are only partially captured with traditional methods
Central to science and technology are questions about how to measure facial expression. Thecurrent gold standard is the facial action coding system (FACS), which is often assumed toaccount for all facial muscle movements relevant toperceived emotion. However, the mapping from FACS codes to perceived emotion is not well understood.
The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures
What would a comprehensive atlas of human emotions include? For 50 years, scientists have sought to map emotion-related experience, expression, physiology, and recognition in terms of the “basic six”—anger, disgust, fear, happiness, sadness, and surprise.
How emotion is experienced and expressed in multiple cultures: a large-scale experiment across North America, Europe, and Japan
Core to understanding emotion are subjective experiences and their expression in facial behavior. Past studies have largely focused on six emotions and prototypical facial poses, reflecting limitations in scale and narrow assumptions about the variety of emotions and their patterns of expression.
Semantic Space Theory: Data-driven insights into basic emotions
Here we present semantic space theory and the data-driven methods it entails. Across the largest studies to date of emotion-related experience, expression, and physiology, we find that emotion is high dimensional, defined by blends of upward of 20 distinct kinds of emotions, and not reducible to low-dimensional structures and conceptual processes as assumed by constructivist accounts.
Deep learning reveals what vocal bursts express in different cultures
Human social life is rich with sighs, chuckles, shrieks and other emotional vocalizations, called ‘vocal bursts’. Nevertheless, the meaning of vocal bursts across cultures is only beginning to be understood. Here, we combined large-scale experimental data collection with deep learning to reveal the shared and culture-specific meanings of vocal bursts.
The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress
The 3rd Multimodal Sentiment Analysis Challenge (MuSe) focuses on multimodal affective computing.
How emotions, relationships, and culture constitute each other: Advances in social functionalist theory
Social Functionalist Theory (SFT) emerged 20 years ago to orient emotion science to the social nature of emotion. Here we expand upon SFT and make the case for how emotions, relationships, and culture constitute one another.
Intersectionality in emotion signaling and recognition: The influence of gender, ethnicity, and social class
Emotional expressions are a language of social interaction. Guided by recent advances in the study of expression and intersectionality, the present investigation examined how gender, ethnicity, and social class influence the signaling and recognition of 34 states in dynamic full-body expressive behavior
Mapping 24 emotions conveyed by brief human vocalization
Emotional vocalizations are central to human social life. Recent studies have documented that people recognize at least 13 emotions in brief vocalizations. This capacity emerges early in development, is preserved in some form across cultures, and informs how people respond emotionally to music.
What the face displays: Mapping 28 emotions conveyed by naturalistic expression
What emotions do the face and body express? Guided by new conceptual and quantitative approaches, we explore the taxonomy of emotion recognized in facial-bodily expression. Participants judged the emotions captured in 1,500 photographs of facial-bodily expression in terms of emotion categories, appraisals, free response, and ecological validity.
The ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, generating, and personalizing vocal bursts
The ICML Expressive Vocalization (EXVO) Competition is focused on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication.
The ACII 2022 Affective Vocal Bursts Workshop & Competition: Understanding a critically understudied modality of emotional expression
The ACII Affective Vocal Bursts Workshop & Competition is focused on understanding multiple affective dimensions of vocal bursts: laughs, gasps, cries, screams, and many other non-linguistic vocalizations central to the expression of emotion and to human communication more generally.
Emotional expression: Advances in basic emotion theory
In this article, we review recent developments in the study of emotional expression within a basic emotion framework. Dozens of new studies find that upwards of 20 emotions are signaled in multimodal and dynamic patterns of expressive behavior.
Deep learning reveals what facial expressions mean to people in different cultures
Cross-cultural studies of the meaning of facial expressions have largely focused on judgments of small sets of stereotypical images by small numbers of people. Here, we used large-scale data collection and machine learning to map what facial expressions convey in six countries.