How do I convert speech to text?

Question

How could I take MP3 and convert the speech to text?

I've got some recorded notes from a conference and from meetings (there is a single voice on the recording, which is my voice). I thought it would be easier and intellectually interesting to convert to text using speech to text tools rather than simply transcribe by hand. I know there are technologies out there, especially for VoIP applications using Asterisk and Podcasts, but what are they and how can I use them?

Maybe pass this on to Joel and Jeff so they can get the text for their wiki transcriptions of the SO podcasts. — Sam Meldrum, Jan 29 '09 at 14:32
As a work around, one could upload the media to Youtube as a video, as Youtube performs CC service when video is uploaded. It's not a developer's solution, but it may get one by in a pinch. https://www.youtube.com/watch?v=yxmfJuC2Uno — iamtoc, Mar 19 '15 at 08:19

score 31 · Accepted Answer · answered Jan 29 '09 at 14:02

31

Open Source: CMU Sphinx

Shareware: http://www.e-speaking.com/ (Windows)

Commercial: Dragon NaturallySpeaking (Windows)

answered Jan 29 '09 at 14:02

Jeff Bauer

13,890
9
51
73

I think the three above are good options to give you what you need to get going, but there'll be some coding (no cut n'n paste hack) to get speak to text working. From my very limited experience of using Sphinx with Asterisk PBX, I'd go for that on the free (beer and speech) vote for a small personal project. – Johnny Maelstrom May 01 '09 at 14:59

score 5 · Answer 2 · answered Mar 23 '12 at 20:24

5

.NET can do it with its System.Speech namespace.

You would have to convert to .wav first or capture the audio live from the mic.

Details on implementation can be found here: Transcribing Audio with .NET

answered Mar 23 '12 at 20:24

bulltorious

7,769
4
49
78

score 4 · Answer 3 · answered Jan 29 '09 at 13:56

4

Dragon NaturallySpeaking seems to support MP3 input.

If you want an open source version (I think there are some Asterisk integration projects based on this one).

answered Jan 29 '09 at 13:56

diciu

29,133
4
51
68

The first link is broken. I imagine it used to go to this: https://www.nuance.com/dragon.html – evaristegd Nov 13 '19 at 03:33

score 3 · Answer 4 · answered Nov 30 '17 at 13:16

Late to the party, so answering more for future reference.

Advances in the field + Mozilla's mindset and agenda led to these two projects towards that end:

The latter has a 12GB data-set for download. The former allows for training a model with your own audio files to my understanding

score 0 · Answer 5 · answered Apr 01 '22 at 17:54

0

You can also try Leopard. This article has an overview. But your code essentially looks like this:

from leopard import *
o = create(access_key=${YOUR_ACCESS_KEY})
print(o.process_file(${YOUR_AUDIO_FILE_PATH}))

answered Apr 01 '22 at 17:54

user2316711

86
1
4

How do I convert speech to text?

5 Answers5

Linked