Tutorial: Hands-on with Hume AI’s API
By Vineet Tiruvadi, MD, PhD on Sep 9, 2022
Hands-on with Hume AI’s API
Welcome to our first Hume AI Platform product walk-through.
A major goal of our platform is to provide developers with new tools to understand users' expressions and wellness, beyond engagement (read more). One of the important ways we do this is through our platform's application programming interface (API).
In this post we’re going to walk you through the main steps of working with our API so you can start integrating our models into creative, scientific, and empathic applications of your own.
We'll cover:
1. Finding Your API access key
2. Deciding Between Batch and Streaming APIs
3. Making the API Call
4. Checking Out Your Results
Ready? Let's get into it.
This is Dr. Dacher Keltner. He’s written extensively on happiness and compassion, and he’s our Chief Scientific Advisor.
Let’s use the Hume AI Platform to measure the facial expression Dr. Keltner is forming in the image here.
To do so, we’ll need to find our API access key.
Step 1: Finding Your API Access Key
This is a key-code specific to your account that lets you authenticate into our platform. To retrieve your API key, visit beta.hume.ai, click on your profile icon in the upper right corner, and choose Settings. Your key is listed as part of your Profile. For a more detailed tutorial on accessing the API, check out our help page.
Next, we need to decide how we want to feed our data into the platform.
Step 2: Deciding Between Batch and Streaming APIs
There are two APIs we can choose from: a batch API and a streaming API.
The batch API can process a single media file or multiple files in parallel. It measures all of the expressions found in each file and notifies you when the results are ready, usually within a few minutes.
The streaming API can be connected to a live webcam or microphone input and returns measures of expressive behavior in real time.
For a saved image, the batch API is the best fit. We’ll work with the batch API in this tutorial, but spend more time with the streaming API in future tutorials.
So, now that we’ve got an API key and we’ve decided to go with the batch API, we can tell the Hume AI Platform where our data is and how we want to analyze it.
Step 3: The API Call
We’ll be using curl to call the API. We’ll first need to specify the URL of the data we want to analyze and the model(s) we want to use (to explore the models we have available, see our Products page).
We’ll also need a link to our data that is accessible to our API (our example will use a publicly available URL), along with our Platform Access Key.
Once we have those ready, we can package up the information into a format our APIs can understand:
curl --header "Content-Type: application/json" --request POST --data '{"urls": ["<YOUR_URL>"],"models": {"<THE_MODEL_TO_USE>": {}}}' "https://api.hume.ai/v0/batch/jobs?apikey=<YOUR_API_KEY>"
To measure the facial expression Dr. Keltner is forming, we’ll send the URL of the picture above to our facial expression model (“face”), using our API key for authorization:
curl --header "Content-Type: application/json" --request POST --data '{"urls": ["https://assets.nationbuilder.com/mysticartists/pages/315/attachments/original/1462271161/DacherKeltner.jpg"],"models": {"face": {}}}' "https://api.hume.ai/v0/batch/jobs?apikey=<YOUR_API_KEY>"
This command typically takes just a second or two to process, but in rare cases may take up to two minutes as we scale up our platform access.
Step 4: Checking Out Our Results
The response to our curl request includes a URL where we can find our results once the models are done processing our data. By default, as soon as that URL is populated, we’ll receive an email from the platform with a link to the results in a JSON formatted file similar to the example shown below. The JSON contains some additional metadata, as well as a breakdown of the scores for dimensions of expression, labeled with emotion categories, that we’ve found are largely consistent in meaning across cultures (read more here).
Once the models are done, we're provided with measures of the facial expression that Dr. Keltner is forming. The output indicates that Dr. Keltner’s expression loads highly on dimensions of “Joy” and “Amusement.” Note that these labels are just shorthands for underlying patterns of facial movement, not readouts of what Dr. Keltner is feeling (which would be impossible). We use emotion labels because they capture the subtlety of human expression; unfortunately, coarse descriptors like “smile” and “scowl” do not.
{ "bbox": { "x": 94.045, "y": 38.421, "w": 66.237, "h": 86.245 }, "emotions": [ { "name": "Calmness", "score": 0.220 }, { "name": "Boredom", "score": 0.198 }, { "name": "Interest", "score": 0.185 } # ... More emotions ] }
And that’s it! We’ve gotten ourselves set up with the Hume AI Platform. Now that we can use Hume’s APIs, we have everything we need to start integrating cutting-edge models of expressive communication into bigger projects.
In Closing
The Hume AI Platform strives to be the only toolkit developers need to measure verbal and nonverbal cues in audio, video, or images, based on rigorous scientific studies of human expressive behavior. Our API is the simplest programmatic endpoint for working with our models. You can also explore the outputs of our models interactively on our Playground.
In this post we walked through the basics of using our API to measure expressive behavior in an example file. For more details and the latest documentation, be sure to bookmark our tutorials. And if you plan to develop an application using our API, note that it will need to adhere to the ethical guidelines of the The Hume Initiative.
We’re excited to expand our private beta over the next few months, and look forward to what you’ll build with it! If you have any questions, please reach out to [email protected].
Connect With Us
Follow us on twitter @hume_ai or reach out to us directly. If you’re interested in beta access, feel free to sign up.
Subscribe
Sign up now to get notified of any updates or new articles.
Share article
Recent articles
Introducing EVI 2, our new foundational voice-to-voice model
EVI 2 is our new foundational voice-to-voice model. It is one of the first AI models with which you can have remarkably human-like voice conversations. It can converse rapidly and fluently with users with subsecond response times, understand a user’s tone of voice, generate any tone of voice, and even respond to some more niche requests like changing its speaking rate or rapping. It can emulate a wide range of personalities, accents, and speaking styles and possesses emergent multilingual capabilities.
Comparing the world’s first voice-to-voice AI models
The world’s first working voice-to-voice models are Hume AI's Empathic Voice Interface 2 (EVI 2) and OpenAI's GPT-4o Advanced Voice Mode (GPT-4o-voice). EVI 2 is publicly available, as an app and an API that developers can build on. On the other hand, GPT-4o-voice has been previewed to a small number of ChatGPT users. Here we explore the similarities, differences, and potential applications of these systems.
How Tone AI uses Hume’s API to boost audience growth
How Tone AI uses Hume’s Expression Measurement API to boost audience growth for NFL teams and media organizations