Hume Startup Grant Program now liveApplication
Science

Can AI “detect” emotions?

By Jeffrey Brooks, PhD on Sep 23, 2024

Aidetectemo2

Can AI “detect” emotions?

  • AI cannot "detect" emotions or read minds; it identifies emotion-related behaviors through patterns in observable expressions, such as facial cues or vocal bursts.
  • Unlike "emotion AI" empathic AI aims to produce better responses based on how we express ourselves, rather than claiming to understand internal emotional states.
  • Expressions, including facial movements and vocal bursts, are not straightforward signals of emotions. They are part of a shared language with cultural and contextual variations, which AI can interpret probabilistically.
  • Empathic AI expands on basic sentiment analysis by measuring 48+ emotional dimensions, capturing a richer and more nuanced understanding of human emotions, which is crucial for improving human-AI interactions.

Emotion AI, Affective Computing, or Empathic AI?

For AI to enhance our emotional well-being and engage with us meaningfully, it needs to understand the way we express ourselves and respond appropriately. This capability lies at the heart of a field of AI research that focuses on machine learning models capable of identifying and categorizing emotion-related behaviors. However, this area of research is frequently misunderstood, often sensationalized under the umbrella term "emotion AI"--AI that can “detect” emotions, an impossible form of mind-reading.

While AI can capture patterns in human expressions, it’s crucial to recognize that these technologies interpret behaviors in probabilistic terms, not certainties. AI cannot access private emotional experiences. It cannot ‘read minds’ or truly ‘detect’ what we are feeling. Instead, real applications of these technologies focus on observable data and shared human interpretations of visible emotional expressions. 

For these reasons, we believe terms like “emotion AI” and its academic equivalent “affective computing” are misleading. We’ve coined a term we believe is more accurate: “empathic AI,” or AI that uses an understanding of how we express ourselves to produce better responses.

To demystify empathic AI, it's important to understand what AI is actually measuring and the science behind human expressions.

The Science of Expression

Human life is rich with emotional expression—we smile, frown, laugh, sigh, raise our eyebrows, and use countless other subtle cues to communicate how we feel. These behaviors provide a constant undercurrent to our social lives, helping us navigate relationships, collaborate, and build communities. Expressions are (obviously) not a direct window into our minds—for instance, people also smile while angry and laugh while sad. Still, these behaviors are a shared language with generally agreed-upon meanings that provide the building blocks for empathy. Emotion scientists have spent the last 50+ years studying these expressions, trying to decode what they mean and how they influence human interaction. 

Underpinning these studies is the notion that emotions have both universal and culturally specific components. While certain expressions may be recognized across the globe, the way people display and interpret these emotions can vary based on social norms and individual context.

  • Facial expressions: Facial expressions are perhaps the most well-known indicators of emotion. A smile might signal happiness, a frown displeasure, and a raised eyebrow curiosity. But facial expressions are rarely straightforward; they can blend multiple meanings that unfold over time, making them challenging to interpret. The researchers at Hume consistently find that 20+ different facial expressions have shared meanings worldwide.

  • Vocal expressions and speech prosody: Beyond facial expressions, vocal cues also play a crucial role in how we communicate our feelings. The tone, pitch, and rhythm of our speech—collectively known as speech prosody—are core to human expression. Additionally, nonlinguistic “vocal bursts” like laughter, sighs, and gasps provide context for how we feel, even without words. Our research shows that vocal bursts are a strikingly rich source of information about emotion – which isn’t that surprising, when you think about it. After all, when a film elicits laughs, cries, sighs, oohs and ahhs from the audience, that gives us information about how the people in the audience are feeling and what might be happening in the film. Together with speech prosody, our research also shows that 20+ vocal bursts have shared meanings across cultures.

The Challenge of Defining Expressions

But how do you define an expression? Who gets to say that a given sound is “happy” or “sad”, or a facial expression is “concerned”? The reality is that expressions are complex, and often contain blends of various emotions. But this complexity extends to Hume’s expression measurement models, which attempt to characterize the feelings people readily associate with each of these expressive behaviors when they see them in isolation, but do not attempt to predict what a person forming them is actually experiencing at a given time. This is the core difference between “emotion detection” and expression measurement.

How AI Models Interpret Expressive Behaviors

Our expression measurement models offer objective measurements of expressive behaviors that people readily associate with various feelings when they see or hear them in isolation. The scores, or predictions, reflect the confidence and intensity with which someone might associate the detected behaviors with a given expression. These are not randomly assigned but come from extensive research which informs how the models are trained. Again, these models are designed to predict how others might interpret an expression, not to make definitive judgments about internal emotional states.

When subjects in our studies are asked to label expressions, they are not given predefined definitions. This design choice ensures that the AI's predictions align with how people commonly interpret emotional expressions in real-world contexts. The result is a model that reflects a shared, aggregate understanding of what different expressions look and sound like, rather than rigid, predefined categories.

In other words, these labels are defined operationally—based on how most people would describe what they see and hear.

For example, instead of predicting that someone is "joyful," what the models really predict is that an expression has a high likelihood of being interpreted as joyful by a plurality of people if presented out of context. This approach ensures that AI systems remain grounded in observable data, avoiding the impression that AI can “detect” emotions—a concept that greatly exaggerates AI capabilities and sparks inflated privacy concerns, as if mind-reading technology were a reality rather than science fiction.

Beyond Sentiment Analysis: What Does Expression Measurement Add? 

Traditional sentiment analysis typically focuses on just four categories: positive, negative, neutral, and mixed. While this provides a basic sense of emotional tone, it falls short in capturing the complexity of human emotions. Our expression measurement models add a richer, more nuanced understanding by considering 48+ additional dimensions of emotion corresponding to the categories we typically use to describe our own expressions–emotions like anger, disgust, love, and pride. This approach leads to a more realistic and comprehensive representation of what humans actually express, and also takes into account the broader range of emotion categories that we use to understand each other and our expressions. Our research finds that specific emotion categories, rather than broad valence and arousal dimensions, shape the way that we interpret our own emotions and others’ expressions.

Conclusion: Building Blocks of Emotional Understanding

While human expressions aren’t a direct window into our minds, they are crucial building blocks of empathy. By measuring and interpreting these behaviors, we can design empathic AI systems to recognize and respond to human expressions, fostering more natural and emotionally aware interactions. This is the goal of empathic AI—to create systems that not only understand the words we speak but also how we same them and what that indicates about our preferences.

Empathic AI has the potential to enhance how we communicate with technology, improve how technology affects our well-being, and enable more meaningful connections among humans. By grounding these technologies in the science of human expression and maintaining a focus on interpretability, we can ensure that AI respects our privacy while supporting our personal preferences and emotional needs.

Resources

Developer Platform

Our Research

API Documentation

Follow us on Twitter and Linkedin

Subscribe

Sign up now to get notified of any updates or new articles.

Recent articles