Research

The science of emotion

Explore our publications, models, and datasets pushing the boundaries of empathic AI.

#1

in naturalness and expressivity

600+tags

of emotions and voice characteristics detected

250ms

speech LLM latency

Performance

State-of-the-art results

Our models consistently achieve top performance across industry benchmarks.

Naturalness

Most natural voice conversations

In blind comparisons, users consistently rate Hume voices as more natural and human-like than alternatives.

  • Authentic speech rhythms and pauses
  • Natural intonation patterns
  • Human-like breathing and cadence
Naturalness Score (higher is better)
0.02.04.06.08.010.0HumeOpenAI
Empathy

Superior emotional understanding

Hume's empathic AI demonstrates significantly higher emotional awareness and appropriate responses in conversations.

  • Recognizes frustration and responds with patience
  • Detects excitement and matches energy
  • Senses uncertainty and offers reassurance
Empathy Score (higher is better)
0.02.04.06.08.010.0HumeOpenAI
Expressiveness

Most expressive voice AI

Hume voices convey a wider range of emotions and nuanced expressions compared to other voice AI providers.

  • Warm enthusiasm for good news
  • Gentle concern when discussing problems
  • Playful humor in casual moments
Expressiveness Score (higher is better)
0.02.04.06.08.010.0HumeOpenAI
Hard Inputs

Best pronunciation of challenging content

Our TTS excels at pronouncing difficult content like phone numbers and mathematical expressions that trip up other systems.

  • The local mycologist explained that consuming just one fourth plus one fourth equals one half ounce of the misidentified death caps could prove fatal within forty eight hours.
  • Most businesses close in the late afternoon from between two until four thirty or five o'clock when it can get hot.
  • On december fifteenth two thousand seven, Dennis Kucinich raised one hundred thirty one thousand four hundred dollars from approximately one thousand six hundred donors.
Pass Rate by Input Type (higher is better)
0%20%40%60%80%100%MathEmailsDatesTimeMeasurementsCurrency

Tested on 2,167 samples

Expression Analysis

Most accurate emotion identification

When listeners rate how well they can identify the intended emotion, Hume voices consistently outperform competitors.

  • Joy, sadness, anger, fear, surprise
  • Subtle cues like hesitation or relief
  • Complex emotions like bittersweet nostalgia
Distressed
1 / 8
0.01.02.03.04.05.0HumeOpenAI

Identification score (higher is better)

Instruction Following

Precisely follows your vocal directions

When you ask for a specific vocal style, emotion, or character, Hume delivers exactly what you requested.

  • "Speak with a whisper, like sharing a secret"
  • "Sound excited and out of breath"
  • "Use a sarcastic, know-it-all tone"
Instruction Following (higher is better)
0.01.02.03.04.05.0HumeOpenAISesameGemini

Tested across 32 vocal instructions

Recent Publications

Peer-reviewed insights

View all
arXiv·Feb 2026

TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (Under Review)

TD
SR
AG
+6
Trung Dang, Sharath Rao, Ananya Gupta and 6 more

Modern Text-to-Speech (TTS) systems increasingly leverage Large Language Model (LLM) architectures to achieve scalable, high-fidelity, zero-shot generation. However, these systems typically rely on fixed-frame-rate acoustic tokenization, resulting in speech sequences that are significantly longer than, and asynchronous with their corresponding text. Beyond computational inefficiency, this sequence length disparity often triggers hallucinations in TTS and amplifies the modality gap in spoken language modeling (SLM). In this paper, we propose a novel tokenization scheme that establishes one-to-one synchronization between continuous acoustic features and text tokens, enabling unified, single-stream modeling within an LLM. We demonstrate that these synchronous tokens maintain high-fidelity audio reconstruction and can be effectively modeled in a latent space by a large language model with a flow matching head. Moreover, the ability to seamlessly toggle speech modality within the context enables text-only guidance--a technique that blends logits from text-only and text-speech modes to flexibly bridge the gap toward text-only LLM intelligence. Experimental results indicate that our approach achieves performance competitive with state-of-the-art TTS and SLM systems while virtually eliminating content hallucinations and preserving linguistic integrity, all at a significantly reduced inference cost.

Frontiers in Psychology·May 2024

How emotion is experienced and expressed in multiple cultures: a large-scale experiment across North America, Europe, and Japan

Alan Cowen
Jeff
GP
+13
Alan Cowen, Jeffrey Brooks, Gautam Prasad and 13 more

Core to understanding emotion are subjective experiences and their expression in facial behavior. Past studies have largely focused on six emotions and prototypical facial poses, reflecting limitations in scale and narrow assumptions about the variety of emotions and their patterns of expression.

iScience·Feb 2024

Deep learning reveals what facial expressions mean to people in different cultures

Jeff
LK
MO
+10
Jeffrey Brooks, Lauren Kim, Michael Opara and 10 more

Cross-cultural studies of the meaning of facial expressions have largely focused on judgments of small sets of stereotypical images by small numbers of people. Here, we used large-scale data collection and machine learning to map what facial expressions convey in six countries.

Everything your model needs

Why Our Datasets

World-class data for pre-training and fine-tuning your emotion AI models, backed by years of scientific research.

Contact us

Ethically Sourced

All data collected with informed consent and rigorous privacy protections.

Globally Diverse

Representative samples across cultures, ages, genders, and demographics.

Expert Annotated

Labeled by trained researchers using validated scientific frameworks.

Research Ready

Clean, structured formats optimized for modern ML pipelines.

Explore Our Data

See the structure behind the emotion science

From voice AI training data to multimodal expression datasets, explore our full data catalog.

Browse all datasets

Scientific Foundation

Built on decades of research

Our datasets are grounded in peer-reviewed emotion science, developed in collaboration with leading researchers in psychology, affective computing, and machine learning.

View publications

Peer-reviewed methods

Built on 53+ publications in affective science and validated by independent researchers.

Neuroscience-informed

Datasets designed around how the brain actually processes and expresses emotion.

Validated accuracy

Benchmarked against gold-standard datasets with documented performance metrics.

Continuous updates

Regularly refined with new research findings and expanded training data.

Dataset Validation

Proven in production

View all case studies
Screenshot 2025 04 07 at 4.25.08 Pm 3

Niantic Spatial × Hume AI: Creating Interactive & Spatially Aware AI Companions

In partnership with Snap Inc. (hardware) and Hume AI (voice), Niantic Spatial has developed location-aware companions for Spectacles, blending Snap Inc.’s AR glasses, Niantic Spatial’s Large Geospatial Model, and Hume’s Empathic Voice Interface (EVI) for natural, emotionally intelligent conversation. Niantic Spatial, the team pioneering AI that understands the physical world, is showcasing a compelling glimpse of what can happen when spatial intelligence and augmented reality meet.

Read case study
Gaf Logo

GAF Powers Professional Training with Hume’s Text-to-Speech

To support their extensive training programs and marketing initiatives, GAF leverages Hume's text-to-speech technology to make internal training videos and marketing voiceovers. Our partnership addresses several key needs: Professional training content: Delivering consistent, high-quality audio for thousands of contractors and employees. Marketing collateral: Producing engaging voiceovers for promotional content and product demonstrations. Scalable production: Generating content without the logistics and cost of traditional voice recording. Hume's voice design also proved ideal for GAF. The platform's natural, expressive voices maintain the authoritative yet approachable tone that GAF needs to communicate with contractors, retailers, and customers. Unlike synthetic voices that can sound robotic or overly casual, Hume's TTS technology delivers the polished, trustworthy quality expected from an industry leader.

Read case study
Coconot Logo 3.0

Hume AI powers conversational learning with Coconote

While traditional note-taking apps require students to manually scroll and search through content, Coconote is creating interactive study experiences through conversational AI. Coconote’s voice chat feature, powered by Hume's EVI, helps users transform static notes into dynamic conversations. Students can: Ask natural questions about their lecture content Receive contextual explanations referencing specific notes, and Engage in quiz-style conversations for active learning—all through natural voice interaction.

Read case study

Research Areas

Where Hume enables research

From fundamental affective computing to applied behavioral research, our datasets power studies across the full spectrum of emotion science.

Affective Computing

Study how AI systems can recognize, interpret, and respond to human emotions across modalities.

Human-AI Interaction

Research the dynamics of emotional exchange between humans and AI systems.

Psychology & Behavior

Use expression analysis to study human behavior, mental health, and psychological phenomena.

Speech & Language

Analyze prosodic features, sentiment, and emotional expression in human communication.

Multimodal Learning

Explore how emotion manifests simultaneously across face, voice, and language.

Ethics & AI Safety

Study the ethical implications of emotionally-aware AI systems and develop guidelines.

From the Blog

Latest research updates

View all

License our datasets

Access world-class expression datasets and collaborate with our team on advancing emotion AI.

Stay in the loop

Get the latest on empathic AI research, product updates, and company news.

Join the community

Connect with other developers, share projects, and get help from the team.

Join our Discord