Publication in iScience: Understanding what facial expressions mean in different cultures
By Jeffrey Brooks, PhD on Feb 20, 2024
How many different facial expressions do people form? How do they differ in meaning across cultures? Can AI capture these nuances? Our new paper provides new in-depth answers to these questions with the help of machine learning.
Average facial expressions created from hundreds of thousands of facial reactions collected in six countries. From left to right: India, South Africa, Venezuela, United States, Ethiopia, and China. Curious about the sounds they’re making? See this earlier blog post on vocal bursts in different countries.
Interpreting the meanings of facial expressions is one of the most important social skills we rely on in daily life. We’re able to pick up on facial expressions effortlessly and seamlessly, even when they are subtle and fleeting. But how does this actually happen, and what are the underlying meanings conveyed by facial movements? These questions comprise one of the most fundamental and debated topics in psychology, dating back to Charles Darwin.
What can we say about the meanings of facial expressions? Some of the most-well-known studies in 20th century psychology were undertaken on this topic. Paul Ekman classically posited six universal facial expressions - anger, disgust, fear, happiness, sadness, and surprise. Others have proposed that facial expressions convey basic states such as pleasantness or unpleasantness, arousal or calmness. Still others claim that facial expressions are fundamentally ambiguous and derive all of their meaning from the surrounding situation.
Regardless of their underlying theories and assumptions, studies on facial expressions and what they mean in different cultures have largely been limited by small sample sizes, small and posed image sets, and perceptual and linguistic biases.
Recently, we investigated the emotional meaning of facial expressions with data collected from around the world, using approaches meant to account for all of these limitations at once.
Our findings have recently been accepted for publication in the journal iScience, in a paper titled “Deep learning reveals what facial expressions mean to people in different cultures”.
Our study
Over 13,000 people from six countries - the United States, China, Ethiopia, India, South Africa, and Venezuela - participated in our study to help determine the emotional meanings of facial expressions.
We started with 4,659 images of facial expressions taken from pre-existing databases, our prior work, and an extensive web search. We asked a first group of 5,833 participants to view the facial expressions and tell us what emotions they thought they conveyed (choosing from up to 48 different options – including nuanced states like Empathic Pain, Contemplation, and Aesthetic Appreciation).
On their own, these emotion ratings can tell us a lot about what people around the world see in facial expressions. But, in theory, these ratings could also just be influenced by quirks of the original images, like the visual context, gender, race, or age of the people in the photos.
So, for each facial expression, we asked participants to use their webcam to photograph themselves mimicking the facial expressions they saw. This led to a uniquely large and diverse set of facial expressions – 423,193 facial expressions formed by our participants tagged with self-reported emotions they felt the expressions conveyed. This allowed us to understand more about the underlying facial movements that convey emotion. The mimicry procedure allowed us to decouple ratings of the expressions from these confounding influences for the first time, since the identity and demographics of the individuals in the mimicked images were experimentally randomized relative to the underlying expressions they were imitating.
Did the mimicked facial expressions reliably convey the same emotions as the original “seed” facial expressions? This would indicate some underlying meaning expressed by these facial actions that isn’t shaped by the identity of the person making them, like their gender and age. It would also confirm that the translations of emotion terms in different languages carry similar meanings if they are used to describe the same facial expressions.
In order to address this systematically, we asked an additional group of 7,957 participants from the same countries to view the mimicked facial expressions and decide what emotions they conveyed.
We wanted to determine whether the same underlying facial expressions were present in the images from around the world, and whether these expressions had the same meaning to people in different cultures. How many distinct expressions were there, even though they were made by people from a variety of backgrounds, cultures, and contexts? Did people from around the world use the same emotion concepts to describe the same facial expressions, even though they responded in their own languages?
We were able to approach these questions with new, cutting-edge computational methods because we had a very large dataset to work with. Specifically, we trained an AI model to understand facial expressions, and used this model to address our questions in a data-driven way.
Training AI to understand facial expressions
We trained a deep neural network (DNN) to find facial expressions that had distinct meanings within or across cultures. Using a machine learning model instead of just human ratings allowed us to precisely control several important aspects of our analysis that are otherwise problematic for studies on facial expression.
We trained the DNN on the set of mimicked facial expressions, and tested the DNN by having it predict the emotions in the original seed facial expressions (which it had no exposure to during training). This meant that the model was forced to ignore factors like the particular characteristics of the participants in our study, as these were randomized in the mimicry portion of our experiment. Instead, the model focused on isolating the consistencies in visual input that give rise to human judgments of particular emotions.
We then compared the DNN predictions to the average judgments that the human participants in our study made about the seed facial expressions. These comparisons allowed us to uncover how many distinct facial expressions were in the data, and precisely quantify how many of these expressions had shared meanings across cultures.
We found that our model was able to differentiate 28 different kinds of facial expressions shared across cultures. Twenty-one kinds of facial expression had the same primary meaning across all five cultures, and the remaining seven had the same primary meaning in two or more cultures. The emotions and mental states people associated with the different facial expressions were 63% preserved across cultures (which is impressively high given that emotions concepts themselves can differ across cultures and languages).
The 21 distinct kinds of facial expressions that had the same primary meaning, expressed using the same 21 emotion concepts (or combinations of concepts) or their most direct translations across all five countries, were “anger,” “boredom,” “calmness,” “concentration,” “contemplation/doubt,” “confusion,” “disappointment,” “disgust,” “distress,” “fear,” “interest,” “joy,” “love/romance,” “pain,” “sadness,” “satisfaction/contentment,” “sexual desire,” “surprise (negative),” “surprise (positive),” “tiredness”, and “triumph”.
We visualized the average facial movements associated with each of the 28 dimensions we discovered by morphing together representative mimics in each culture. The emotion concept that loaded most heavily on each dimension in each country is overlaid on the corresponding image.
Importantly, the model architecture allowed us to avoid linguistic bias, which is a tricky problem when conducting experiments and training machine learning models with data from multiple cultures. We structured the model so that the average emotion judgments within each culture (evaluated in three separate languages) were outputted separately. This means that the DNN was not instructed to assume any relationship between emotion concepts and how they are used in different countries or languages (English, Chinese, and Spanish). Since the model was not given any translation of the words from different languages to one another, the relationships we uncovered between the words used in different languages show that the concepts were used similarly to describe the same facial expressions. This means that the dimension corresponding to “happiness” in one country could just have easily been found to correspond to “sadness” in another, if the same facial modulations in fact had opposite meanings across cultures. In fact, we found some cases where the same emotion is inferred from multiple facial expressions. Our approach demonstrates that there is not a one-to-one mapping between facial actions and specific emotions, but that there are some cases where the same facial expression could convey different emotions to different individuals or cultures, or some cases where multiple facial expressions could be interpreted as conveying the same emotion within a culture.
Our findings add to a growing body of work showing that large numbers of emotions have associated expressions with shared meanings across cultures. They also provide an example of how DNNs can be used to investigate psychological processes while controlling for human biases.
The results speak to the universality of facial expression at an unprecedented level of detail, but they are by no means exhaustive. Going forward, we would like to expand this approach to more languages and cultures than we were able to study here.
To learn more about the background for this project and stay up to date on our latest developments, you can visit https://hume.ai/science. To check out more of our published work, you can visit https://dev.hume.ai/docs/resources/science#published-research.
Subscribe
Sign up now to get notified of any updates or new articles.
Share article
Recent articles
Introducing EVI 2, our new foundational voice-to-voice model
EVI 2 is our new foundational voice-to-voice model. It is one of the first AI models with which you can have remarkably human-like voice conversations. It can converse rapidly and fluently with users with subsecond response times, understand a user’s tone of voice, generate any tone of voice, and even respond to some more niche requests like changing its speaking rate or rapping. It can emulate a wide range of personalities, accents, and speaking styles and possesses emergent multilingual capabilities.
Comparing the world’s first voice-to-voice AI models
The world’s first working voice-to-voice models are Hume AI's Empathic Voice Interface 2 (EVI 2) and OpenAI's GPT-4o Advanced Voice Mode (GPT-4o-voice). EVI 2 is publicly available, as an app and an API that developers can build on. On the other hand, GPT-4o-voice has been previewed to a small number of ChatGPT users. Here we explore the similarities, differences, and potential applications of these systems.
How Tone AI uses Hume’s API to boost audience growth
How Tone AI uses Hume’s Expression Measurement API to boost audience growth for NFL teams and media organizations