Voice Training Data Built by Researchers, for Researchers
Datasets for creating realistic voices across global languages—powering our own state-of-the-art models, and now available to power yours.
Datasets
Covering the Full Spectrum of Voice
How It Works
From Research Question to Production-Ready Data
Hume operates a research-grade data pipeline purpose-built for voice.
Request Samples
Start with curated speech datasets from our library.
Create Your Own
Launch custom collections with defined speakers and recording conditions.
License Access
Datasets include rich metadata—demographics, acoustics, and labels.
API Access
Programmatically refresh or generate new training data.
Ready to explore our training data?
Talk to our research team about how Hume's datasets can accelerate your voice AI development.