11

I need to automatically transcribe some short MP3s as part of a proof of concept I am working on. I am currently looking into cloud solutions or web API services to send the MP3 as a simple HTTP request and receive a transcription back.

The only free/open source solution I have found here, but the demos don't seem to work (at least not on the files I need to transcribe). I have found some enterprise solutions for call centers, but so far nothing I can simply integrate into a project.

Are there any web based speech recognition services available? One that is able to filter out small noise would be a plus.

itzmebibin
  • 9,199
  • 8
  • 48
  • 62
MrGlass
  • 9,094
  • 17
  • 64
  • 89

3 Answers3

5

Here is an unofficial method to access Google ASR capability. I just tested on Yesterday and it still works - you can get JSON style ASR output with words and associated confidence score from an FLC audio sampled in 16KHz.

Leo5188
  • 1,967
  • 2
  • 17
  • 21
  • This is a really cool find. Is there any info on a rate limit? – MrGlass Apr 24 '13 at 14:08
  • Please convert your audio files to 16K Hz FLAC. Since this is not an official solution from Google, there are many unknowns:) – Leo5188 May 16 '13 at 16:46
  • 3
    Verified, this method is no longer working now. Though, Google published V2 version of it, requiring an API key, and with quota on it, which is very low. An implementation can be found here: https://github.com/gillesdemey/google-speech-v2 – Jerry Tian Mar 06 '15 at 03:25
1

This may be a good match. Also, their techcrunch profile (See this) lists competitors as: SimulScribe, SpinVox, Vlingo, Nuance, Microsoft, Google Some of these links may be helpful.

Vlingo, Bing and Google have recognizers in the cloud, but I don't think they make them publicly programmable. I believe they are accessible only from their authorized clients.

For a proof of concept (and low volume), have you considered just using the desktop speech engines that come in Windows 7? What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? may be helpful. The MS desktop recognizers ship with a dictation grammar and it sounds like that is what you will need.

Community
  • 1
  • 1
Michael Levy
  • 13,097
  • 15
  • 66
  • 100
  • Yapme, and a couple other services I found after I posted, caters to large clients. I have emailed them (which is the only way to get any API information, pricing, or access) but haven't heard back. the competitors listed provide call center solutions, like I mentioned in my post. I haven't looked into the microsoft speech engines, because my project hinges on me being able to script this, and I work in PHP/Python on a linux server. I might do some basic tests using it, but I would need a different solution. – MrGlass Nov 10 '10 at 21:10
  • 2
    Actually, they are discontinuing their voicemail transcription service, but it isn't clear what is happening with their cloud recognition APIs. They appear to have been purchased by Amazon and so folks are speculating that Amazon may add their reco services to Amazons Cloud services - http://www.theatlantic.com/technology/archive/2011/11/i-see-your-siri-and-raise-you-a-yap-amazon-quietly-snaps-up-speech-recognition-startup/248165/ (your realize that my post you say "-1" about is over a year old...) – Michael Levy Dec 19 '11 at 23:16
  • @MichaelLevy Any update on what happened to YapMe's speech API? Should I edit your answer? – David J. Liszewski Jan 23 '14 at 20:12
  • I have no idea, weren't the acquired by Amazon? please update. – Michael Levy Jan 24 '14 at 01:51
1

Also you can try speech recognition engine of Windows 7 to produce subtitles. Here is the tool for that.

itzmebibin
  • 9,199
  • 8
  • 48
  • 62
VahidN
  • 18,457
  • 8
  • 73
  • 117