I've searched for a reliable way to autonomously transcribe natural speech for years. I'm a journalist, and I often have hours of taped interviews with sources around the globe to transcribe. For now, I'm still paying for people-powered transcription services.
Speech to text has been a huge challenge for AI developers, and it's a puzzle that's being closely watched in a variety of industries. The technology has implications far beyond quoting sources; human-machine interfaces in fields like robotics, autonomous vehicles, and personal computing will benefit from computers that can accurately interpret natural speech.
Transcription, then, is a kind of technological entry point, a straightforward market need that can help spur development of a technology that will have broad resonance and incalculable implications for how we interact with machines.
"Like nearly every market segment, the education, legal, and media and entertainment industries have had to quickly move to a remote environment," says Jai Das[1], Managing Director and President at Sapphire Ventures[2]. "As a result, the need for AI-driven, real-time and accurate transcription services has skyrocketed."
The problem is natural contextual speech, along with accents and dialects, has made the quest for AI-driven transcription quixotic to date. So what do you do when there's a ripe market for a technology but the capability just isn't there yet?
Well, you improvise and use the tools at your disposal while pouring money into technology development.
That's the strategy of an innovative transcription and captioning solution called Verbit[3], which utilizes an in-house, AI-based technology, along with an army of human overseers, to transform live and recorded video and audio into nearly perfect captions and transcripts for the higher education, legal, media, and enterprise industries.
"Verbit combines the speed and low cost of Automatic Speech Recognition technology with the