Whisper – OpenAI’s latest speech transcription package

October 5, 2022October 5, 2022 by yangliu3456

Reading Time: < 1 minute

Speech transcription is the process of converting speech audio into text. The text becomes searchable and there is variety of Natural Language Processing (NLP) tools that can make sense of it. Traditionally this is done by humans. Early technology are less accurate (<70%) so the NLP tools does not work affectively. Machine Learning made great strides and increased the accuracy to more than 90%. However this technology is largely inaccessible to an average person or app developer. Training your own model require technical knowledge, and cloud solutions like Google, AWS, or Microsoft Azure is relatively expensive for large quantities.

With Whisper, developers can use their own GPU capable hardware to make mass amount of speech transcriptions. Theoretically, this will enable more exciting solutions that utilize speech transcription technology. I personally like to see some competition in personal assistant field, on wearable technology.

Here is a tutorial on how to set it up.

https://www.assemblyai.com/blog/how-to-run-openais-whisper-speech-recognition-model/

And the original paper and code if you are interested.

https://openai.com/blog/whisper/

Recent Posts

Most Used Categories

Panzoto

Whisper – OpenAI’s latest speech transcription package

Like this:

Recent Posts

Most Used Categories

Whisper – OpenAI’s latest speech transcription package

Share this:

Like this:

Related Posts

LaMDA model makes dialogs more grounded and more believable

Do human like precision more than recall?