How it Works

When you have an audio or video file, most people apply Automatic Speech Recognition (ASR) and possibly Natural Language Processing (NLP) to create a transcript and call the job complete.

We take a radically different approach.

You send us audio and video files

We process them

You search the contents or ask for reports about what we found

"Ok, but, I want to know about that processing.
What’s inside your black box?"

How it really works...

Clarify is bringing technology from the lab to the real world to give people access to the power of data hidden in their audio and video. All of the signal processing, language analysis, and complex math sits behind an easy to use API with helper libraries and Quickstart guides to help you get you started in minutes. You can see it in action on our demo page.

Audio and video files are full of data. The words spoken are important, but it’s the other data -- environment, subject matter, emotions, and identity -- that give us the context we need to understand the speaker’s intention. We pay attention to all of it.

We take note of media format characteristics like sampling frequency, bit rate, and file duration.

We break down the media and analyze it. We figure out where and how it was recorded, whether or not it contains any speech, and, if it does, what language it’s in.

Using this information, our system designs and configures the best pipeline to extract any words the media might contain, and then organizes those words in meaningful ways.

Since each of the components in the pipeline is built using machine learning techniques, they improve over time.

Once we’ve extracted all the data we can, we give you access to it in two ways:
1. We let you search it.
2. We let you pull reports about it.
This data is data you can compute on, data that is actionable.