AI

Whisper – OpenAI’s latest speech transcription package

Reading Time: < 1 minute

Speech transcription is the process of converting speech audio into text. The text becomes searchable and there is variety of Natural Language Processing (NLP) tools that can make sense of it. Traditionally this is done by humans. Early technology are less accurate (<70%) so the NLP tools does not work affectively. Machine Learning made great strides and increased the accuracy to more than 90%. However this technology is largely inaccessible to an average person or app developer. Training your own model require technical knowledge, and cloud solutions like Google, AWS, or Microsoft Azure is relatively expensive for large quantities.

With Whisper, developers can use their own GPU capable hardware to make mass amount of speech transcriptions. Theoretically, this will enable more exciting solutions that utilize speech transcription technology. I personally like to see some competition in personal assistant field, on wearable technology.

Here is a tutorial on how to set it up.

https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/

And the original paper and code if you are interested.

https://openai.com/blog/whisper/