Introducing Hume’s Empathic Voice Interface (EVI) API
Published on Apr 18, 2024
Integrate emotionally intelligent voice experiences into any application with our EVI API.
Introducing our voice AI – the Empathic Voice Interface
Last month, we released the demo of our Empathic Voice Interface (EVI) API. The first emotionally intelligent voice AI API is finally here!
EVI does a lot more than stitch together transcription, LLMs, and text-to-speech. With a new empathic LLM (eLLM) that processes your tone of voice, EVI unlocks new capabilities like knowing when to speak, generating more empathic language, and intelligently modulating its own tune, rhythm, and timbre.
EVI is the first voice AI that really sounds like it understands you. By adapting its tone of voice, it emulates the way humans convey meaning beyond words, unlocking more efficient, smooth, and satisfying AI interactions.
Accessing EVI: integrating emotionally intelligent voice AI into your applications
The main way to work with EVI is through a WebSocket connection that sends audio and receives responses in real time. This enables fluid, bidirectional dialogue where users speak, EVI listens and analyzes their voice, and EVI generates emotionally intelligent responses. You start a conversation by connecting to the WebSocket and streaming the user’s voice input to EVI.
As the user speaks to EVI, the client can also send EVI text to speak aloud, which is intelligently integrated it into the conversation.
See our documentation for more information on how to integrate EVI into your application. A great way to get started is on our platform, which allows developers to interactively configure custom system prompts and voices.
Empathic AI (eLLM) features
- Responds at the right time: Uses your tone of voice for state-of-the-art end-of-turn detection — the true bottleneck to responding rapidly without interrupting you.
- Understands users’ prosody: Provides streaming measurements of the tune, rhythm, and timbre of the user’s speech using Hume’s prosody model, integrated with our eLLM.
- Forms its own natural tone of voice: Guided by the users’ prosody and language, our model responds with an empathic, naturalistic tone of voice, matching the users’ nuanced “vibe” (calmness, interest, excitement, etc.). It responds to frustration with an apologetic tone, to sadness with sympathy, and more.
- Responds to expression: Powered by our empathic large language model (eLLM), EVI crafts responses that are not just intelligent but attuned to what the user is expressing with their voice.
- Always interruptible: Stops rapidly whenever users interject, listens, and responds with the right context based on where it left off.
- Aligned with well-being: Trained on human reactions to optimize for positive expressions like happiness and satisfaction. EVI continuously learns from users’ reactions.
Configurability: customizing your voice AI API
With the general release of EVI we’re also releasing our Configuration API, which will enable developers to customize their EVI—the system prompt, LLM, the tools that EVI can use, context to use during the conversation, and more. You can configure EVI in both the API or the UI. Configurable elements below —
-
System prompt: customize EVI’s personality, response style, and the content of speech through prompt engineering. Use our guidelines for prompting EVI to improve the performance, or try out our sample prompts on the voice playground.
-
Inject other LLM responses into our model: Hume’s empathic large language model (eLLM) always generates the first response to a query, but you can configure other LLMs to formulate longer responses.
-
Integrate another LLM API: Currently we support Fireworks Mixtral8x7b, all OpenAI models, and all Anthropic models.
-
Bring your own LLM or generate text another way: Connect our WebSocket to your own server with your own tool or text generation, allowing you to determine all EVI messages in the conversation.
-
Bring your own model: Rather than using our LLMs, connect our WebSocket to your own server to generate your own text, allowing you to determine exactly how EVI responds in the conversation.
-
TTS: Use just EVI’s expressive voice by sending our API text to be spoken aloud.
We plan to add more configuration options soon, allowing EVI to use tools, change its speaking style, and more. Join our Discord for product updates and technical support.
Learn more about emotionally intelligent voice AI
Build with our conversational AI voice
Speak with our Empathic Voice Interface
Subscribe
Sign up now to get notified of any updates or new articles.
Share article
Recent articles
Are emotional expressions universal?
Do people around the world express themselves in the same way? Does a smile mean the same thing worldwide? And how about a chuckle, a sigh, or a grimace? These questions about the cross-cultural universality of expressions are among the more important and long-standing in behavioral sciences like psychology and anthropology—and central to the study of emotion.
How can artificial intelligence achieve the level of emotional intelligence required to understand what makes us happy? As AI becomes increasingly integrated into our daily lives, the need for AI to understand emotional behaviors and what they signal about our intentions and preferences has never been more critical.
For AI to enhance our emotional well-being and engage with us meaningfully, it needs to understand the way we express ourselves and respond appropriately. This capability lies at the heart of a field of AI research that focuses on machine learning models capable of identifying and categorizing emotion-related behaviors. However, this area of research is frequently misunderstood, often sensationalized under the umbrella term "emotion AI"--AI that can “detect” emotions, an impossible form of mind-reading.