Octave (Omni-capable text and voice engine) isn't a traditional TTS model. It’s a voice-based LLM. That means it understands what words mean in context, so it can predict emotions, cadence, and more.
Full prompt: The speaker is a medieval peasant with a cockney accent, raspy voice, dripping with sarcasm.
Full prompt: A retired Black female literature professor who analyzes poetry with precise academic language and references to her own published criticism.
Full prompt: The speaker is a grizzled old cowboy with a folksy Texan drawl Southern accent, speaking in a charismatic tone with a deep but relaxed vibe.
Full prompt: The star of a popular sitcom, with frequent inner monologues about her life.
Full prompt: A know-it-all dungeons and dragons dungeon master speaking excitedly with a lisp.
Full prompt: The speaker is a sophisticated British female narrator with a gentle, warm voice, recounting the ending of a classic romance novel.
Full prompt: The speaker is an American, deep middle-aged male film trailer narrator for a film about chickens.
Full prompt: A villainous undead vampire, with a horrifying raspy voice, and a slight Transylvanian accent.
Full prompt: A middle-aged African American man, reminiscing with a slightly gravelly voice and a tone of hard-earned wisdom.
Full prompt: The speaker is a distinguished British narrator, whose voice carries a deep sense of wisdom and curiosity.
Prompt: The speaker has a booming, charismatic radio voice, like a Texan fishing guru with a hint of gravel and an infectious laugh, perfect for reeling in listeners to 'Big Dicky's live fishing frenzy.'
Octave is the first TTS system that can take natural language instructions to change emotional delivery and speaking style. Give directions like "sound sarcastic" or "whisper fearfully." For the first time, creators have total control.
Octave was built to generate the most expressive AI voices for any content: podcasts, voiceovers, audiobooks, and more. With our streaming API, you can bring it to any application.
00/00
As a speech-language model, where the same intelligence handles transcription, language, and speech, EVI 3 brings more expressiveness, realism, and emotional understanding to voice AI.
Hume's Text-to-Speech model, Octave, is available today for content creators and developers. Octave understands what words mean in context, so it can predict emotions, cadence, and more. It can also take natural language instructions to change emotional delivery and speaking style. Give directions like "sound sarcastic" or "whisper fearfully." For the first time, creators have total control.
Measure emotional expression with unmatched precision. One API, four modalities, hundreds of dimensions of emotional expression.
Platform
Create your Hume account, get your API keys, monitor your usage, and explore our products in the interactive platform.
Documentation
Explore our documentation with concise guides, hands-on tutorials, and an in-depth API reference—crafted to support your integration.
Community
Join our community of developers and researchers working with Hume APIs—your go-to hub for collaboration, support, and knowledge sharing.
00/00
Prompt the first LLM for text-to-speech to create new voices, instruct emotions, and more
Sign up for our newsletter to hear our latest scientific and product updates