Introducing EVI 2, our new foundational AI voice model

Alan Cowen

·September 11, 2024·product

EVI 2 is our new voice-to-voice foundation model. It is one of the first AI models with which you can have remarkably human-like voice conversations. It can converse rapidly and fluently with users with subsecond response times, understand a user’s tone of voice, generate any tone of voice, and even respond to some more niche requests like changing its speaking rate or rapping. It can emulate a wide range of personalities, accents, and speaking styles and possesses emergent multilingual capabilities.

At a higher level, EVI 2 excels at anticipating and adapting to your preferences, made possible by its special training for emotional intelligence. It’s trained to maintain characters and personalities that are fun and interesting to interact with. Put together, EVI 2 is designed to emulate the ideal AI personality for each application it is built into and each user.

Getting started with EVI 2

Today, EVI 2 is available in beta for anyone to use. It is available to talk to via our app and to build into applications via our API (in keeping our guidelines).

Importantly, EVI 2 is incapable of cloning voices without modifications to its code. This is by design: we believe voice cloning has unique risks. By controlling its identity-related voice characteristics at the model architecture level, we force the model to adopt one voice identity at a time, maintaining it across sessions.

But we still wanted to give users and developers the ability to adapt EVI 2’s voice to their unique preferences and requirements. To that end, we developed an experimental voice modulation approach that allows anyone to create synthetic voices and personalities. Developers can adjust EVI 2’s base voices along a number of continuous scales, including gender, nasality, pitch, and more. This first-of-its-kind feature allows you to create tailored voices for specific apps and users without the risks of voice cloning.

What's next?

The model that we’re releasing today is EVI-2-small. We are still making improvements to this model—in the coming weeks, it will become more reliable, learn more languages, follow more complex instructions, and use a wider range of tools. We’re also fine-tuning EVI-2-large, which we will be announcing soon.

EVI 2 represents a critical step forward on our mission to optimize AI for human well-being. We focused on making its voice and personality highly adaptable to give it more affordances to optimize for users’ happiness and satisfaction. After all, personalities are the amalgamation of many subtle, subsecond decisions made during our interactions, and EVI 2 demonstrates that AI optimized for well-being will have a particularly pleasant and fun personality as a result of its deeper alignment with your goals. Our ongoing research is focused on optimizing for each user’s preferences automatically, with methods to fine-tune the model to generate responses that align with signs of happiness and satisfaction during everyday use of an application.

Resources

Introducing EVI 2, our new foundational AI voice model

Getting started with EVI 2

What's next?

Resources

Recommended Posts

Stay in the loop

Join the community