Introducing EVI 2, our new foundational voice-to-voice model
By Alan Cowen on Sep 11, 2024
Introducing EVI 2, our new foundational voice-to-voice model
EVI 2 is our new voice-to-voice foundation model. It is one of the first AI models with which you can have remarkably human-like voice conversations. It can converse rapidly and fluently with users with subsecond response times, understand a user’s tone of voice, generate any tone of voice, and even respond to some more niche requests like changing its speaking rate or rapping. It can emulate a wide range of personalities, accents, and speaking styles and possesses emergent multilingual capabilities.
At a higher level, EVI 2 excels at anticipating and adapting to your preferences, made possible by its special training for emotional intelligence. It’s trained to maintain characters and personalities that are fun and interesting to interact with. Put together, EVI 2 is designed to emulate the ideal AI personality for each application it is built into and each user.
Getting started with EVI 2
Today, EVI 2 is available in beta for anyone to use. It is available to talk to via our app and to build into applications via our API (in keeping our guidelines).
Importantly, EVI 2 is incapable of cloning voices without modifications to its code. This is by design: we believe voice cloning has unique risks. By controlling its identity-related voice characteristics at the model architecture level, we force the model to adopt one voice identity at a time, maintaining it across sessions.
But we still wanted to give users and developers the ability to adapt EVI 2’s voice to their unique preferences and requirements. To that end, we developed an experimental voice modulation approach that allows anyone to create synthetic voices and personalities. Developers can adjust EVI 2’s base voices along a number of continuous scales, including gender, nasality, pitch, and more. This first-of-its-kind feature allows you to create tailored voices for specific apps and users without the risks of voice cloning.
What's next?
The model that we’re releasing today is EVI-2-small. We are still making improvements to this model—in the coming weeks, it will become more reliable, learn more languages, follow more complex instructions, and use a wider range of tools. We’re also fine-tuning EVI-2-large, which we will be announcing soon.
EVI 2 represents a critical step forward on our mission to optimize AI for human well-being. We focused on making its voice and personality highly adaptable to give it more affordances to optimize for users’ happiness and satisfaction. After all, personalities are the amalgamation of many subtle, subsecond decisions made during our interactions, and EVI 2 demonstrates that AI optimized for well-being will have a particularly pleasant and fun personality as a result of its deeper alignment with your goals. Our ongoing research is focused on optimizing for each user’s preferences automatically, with methods to fine-tune the model to generate responses that align with signs of happiness and satisfaction during everyday use of an application.
Resources
Subscribe
Sign up now to get notified of any updates or new articles.
Share article
Recent articles
Are emotional expressions universal?
Do people around the world express themselves in the same way? Does a smile mean the same thing worldwide? And how about a chuckle, a sigh, or a grimace? These questions about the cross-cultural universality of expressions are among the more important and long-standing in behavioral sciences like psychology and anthropology—and central to the study of emotion.
How can artificial intelligence achieve the level of emotional intelligence required to understand what makes us happy? As AI becomes increasingly integrated into our daily lives, the need for AI to understand emotional behaviors and what they signal about our intentions and preferences has never been more critical.
For AI to enhance our emotional well-being and engage with us meaningfully, it needs to understand the way we express ourselves and respond appropriately. This capability lies at the heart of a field of AI research that focuses on machine learning models capable of identifying and categorizing emotion-related behaviors. However, this area of research is frequently misunderstood, often sensationalized under the umbrella term "emotion AI"--AI that can “detect” emotions, an impossible form of mind-reading.