Hume Raises $50M Series B and Releases New Empathic Voice Interface
Published on Mar 25, 2024
To build emotionally intelligent voice AI, Hume AI raised a $50m Series B round led by EQT Ventures and joined by Union Square Ventures, Nat Friedman & Daniel Gross, Metaplanet, Northwell Holdings, Comcast Ventures, and LG Technology Ventures.
The company is unveiling its new flagship product, an Empathic Voice Interface (EVI). The conversational voice-to-voice AI knows when users are finished speaking and learns to generate vocal responses optimized for user satisfaction. EVI is powered by a new form of multimodal generative AI called an empathic large language model (eLLM) developed by Hume.
Building the first empathic AI
Hume AI is delighted to announce its $50m Series B! The round was led by EQT Ventures, with participation from Union Square Ventures, Nat Friedman & Daniel Gross, Metaplanet, Northwell Holdings, Comcast Ventures, and LG Technology Ventures also joined the round.
In conjunction with today's Series B announcement, Hume is launching its Empathic Voice Interface (EVI), a first-of-its-kind conversational AI with emotional intelligence. EVI is trained on data from millions of human interactions and uses vocal tones to understand when users finish speaking, predict their preferences, and optimize responses for satisfaction over time.
The future of voice AI
AI voice products have the potential to revolutionize our interaction with technology. However their often stilted and mechanical responses act as barriers to truly immersive conversations. The goal with EVI is to provide the basis for engaging voice-first experiences that emulate the natural speech patterns of human conversation.
Speak with the next generation of emotionally intelligent voice AI
EVI uses a new form of multimodal generative AI that integrates large language models (LLMs) with expression measures, which Hume refers to as an empathic large language model (eLLM). Our eLLM enables EVI to adjust the words it uses and its tone of voice based on the context and the user’s emotional expressions. Developers will be able to integrate voice-first experiences into any application with a few lines of code. EVI will be publicly available in April, sign-up here for updates: link.hume.ai/evi-waitlist
Learn about the features that make EVI a human-like conversationalist:
- A universal voice interface, a single API for transcription, frontier LLMs, and text-to-speech.
- End-of-turn detection, uses your tone of voice for state-of-the-art end-of-turn detection, eliminating awkward overlaps.
- Interruptibility, stops speaking when interrupted and starts listening, just like a human.
- Responds to expression, understands the natural ups and downs in pitch & tone used to convey meaning beyond words.
- Expressive TTS, generates the right tone of voice to respond with natural, expressive speech.
- Aligned with your application, learns from users' reactions to self-improve by optimizing for happiness and satisfaction.
Hume continues its dedication to the development of safe and responsible AI through The Hume Initiative, a nonprofit that brings together AI researchers, ethicists, social scientists, and legal scholars to develop ethical guidelines for empathic AI.
Alan Cowen, CEO and Chief Scientist, sees empathic AI as essential to aligning AI with human well-being:
“The main limitation of current AI systems is that they’re guided by superficial human ratings and instructions, which are error-prone and fail to tap into AI’s vast potential to come up with new ways to make people happy. By building AI that learns directly from proxies of human happiness, we’re effectively teaching it to reconstruct human preferences from first principles and then update that knowledge with every new person it talks to and every new application it’s embedded in.”
Shaping the future of voice AI and empathic technology
Today’s announcement follows a period of exciting growth for Hume. Over the past year, Hume launched two key products: the Expression Measurement API, an advanced toolkit for measuring human emotional expression, and Custom Models, which uses transfer learning on those measurements to predict human preferences. Additionally, Hume grew its foundational databases to include naturalistic data from over a million diverse participants, doubled its headcount from 15 to 30 employees and published over eight academic articles in top journals.
Funding will accelerate Hume’s growth into a global player in the generative AI space and cement empathic AI as an industry standard. The capital will be allocated to scale Hume’s team, accelerate its AI research, and continue the development of its Empathic Voice Interface. Interested in helping us build the future of empathic AI, apply here: hume.ai/careers
Subscribe
Sign up now to get notified of any updates or new articles.
Share article
Recent articles
00/00
We’re introducing Voice Control, a novel interpretability-based method that brings precise control to AI voice customization without the risks of voice cloning. Our tool gives developers control over 10 voice dimensions, labeled “masculine/feminine,” “assertiveness,” “buoyancy,” “confidence,” “enthusiasm,” “nasality,” “relaxedness,” “smoothness,” “tepidity,” and “tightness.” Unlike prompt-based approaches, Voice Control enables continuous adjustments along these dimensions, allowing for precise control and making voice modifications reproducible across sessions.
Hume AI creates emotionally intelligent voice interactions with Claude
Hume AI trained its speech-language foundation model to verbalize Claude responses, powering natural, empathic voice conversations that help developers build trust with users in healthcare, customer service, and consumer applications.
How EverFriends.ai uses empathic AI for eldercare
To truly connect with users and provide a natural, empathic experience, EverFriends.ai needed an AI solution capable of understanding and responding to emotional cues. They found their answer in Hume's Empathic Voice Interface (EVI). EVI merges generative language and voice into a single model trained specifically for emotional intelligence, enabling it to emphasize the right words, laugh or sigh at appropriate times, and much more, guided by language prompting to suit any particular use case.