4

I have a buffer of audio and I'd like to perform speech recognition/transcription on it. I have limited CPU and RAM locally so I want to perform recognition on a server.

Are there any (web) services that allow me to do this?

My searches so far have led nowhere...

Dave Peck
  • 1,342
  • 1
  • 17
  • 24

2 Answers2

2

Google has just introduced browser-based access to its speech engine through HTML5.

http://slides.html5rocks.com/#speech-input

To get this page to work, I launched the Chromium browser as follows in Ubuntu:

$ chromium-browser --enable-speech-input

I believe that the idea is to be able to build applications that use Google's speech recognizer, but I haven't had a chance to look deeply into it.

Another interesting project is WAMI from MIT: http://wami.csail.mit.edu

wwwilliam
  • 9,142
  • 8
  • 30
  • 33
  • 2
    And... since Chromium is OSS, I just spent some time and discovered that yes, indeed, there is a RESTful service endpoint that it talks to. It shouldn't be too hard to build a separate library for invoking recognition... – Dave Peck Feb 13 '11 at 04:18
  • I didn't work on it, although it should be fairly straightforward to implement an API in Python/Ruby/etc that does what Chromium does... assuming you can find a Speex codec API for your language of choice. – Dave Peck Jun 06 '12 at 04:09
1

Lumenvox offers such a service but seems expensive for your needs.

clyfe
  • 23,695
  • 8
  • 85
  • 109
  • This is a good find, though their programmer documentation is nonexistent. Looks like it is "buy first, understand later." I also found Spinvox Create, for which the docs are available -- but it is a terrible bunch of web API cruft, requiring custom headers, digest authentication, multipart posts containing XML and 64-encoded audio in a format that's not outrageous but not easily converted to from my device... – Dave Peck Apr 22 '10 at 18:56