15

I would like to see if it's possible to have direct access to Opus using getUserMedia or anything similar from the latest browsers.

I've been researching on it a lot but with no Good results.

I'm aware that either Opus or Speex are actually used in webkitSpeechRecognition API. I would like to do speech recognition but using my own server rather than Google's.

Omar Al-Ithawi
  • 4,988
  • 5
  • 36
  • 47

4 Answers4

20

So there are a lot of suggestions about Emscripten but nobody did, so I ported the encoder opus-tools to JavaScript using Emscripten. Dependent on what one has in mind, there are now the following opportunities:

Rainer Rillke
  • 1,281
  • 12
  • 24
  • Wow! This is amazing. Marking this as the accepted answer, although I haven't tested it! – Omar Al-Ithawi Dec 24 '14 at 09:50
  • @OmarIthawi Thank you. Check out [this demo](https://blog.rillke.com/opusenc.js/) and [report bugs](https://github.com/Rillke/opusenc.js/issues) or tell me how to make it more awesome. – Rainer Rillke Dec 31 '14 at 05:20
5

We're using emscripten for encoding and decoding using gsm610 with getUserMedia, and it works incredibly well, even on mobile devices. These days javascript gives almost native performance, so emscripten is viable for compiling codecs. The only issue is potentially very large .js files, so you want to only compile the parts you are using.

CpnCrunch
  • 4,831
  • 1
  • 33
  • 31
3

Unfortunately, it isn't currently possible to access browser codecs directly from JavaScript for encoding. The only way to do it would be to utilize WebRTC and set up recording on the server. I've tried this by compiling libjingle with some other code out of Chromium to get it to run on a Node.js server... it's almost impossible.

The only thing you can do currently is send raw PCM data to your server. This takes up quite a bit of bandwidth, but you can minimize that by converting the float32 samples down to 16 bit (or 8 bit if your speech recognition can handle it).

Hopefully the media recorder API will show up soon so we can use browser codecs.

Brad
  • 159,648
  • 54
  • 349
  • 530
  • Thanks a lot. I think I've reached the edge of HTML5. Sadly I will go back to a Flash based solution using `rtmp`. – Omar Al-Ithawi Dec 15 '13 at 12:41
  • The sad thing is that Google already have this in two components `x-webkit-speech` and `webkitSpeechRecognition`, I wish if they would just allow changing the server. This is would really solve my problem. – Omar Al-Ithawi Dec 15 '13 at 12:43
  • @OmarIthawi I actually disagree that the speech recognition API is where this should be done. I can imagine a case where speech recognition could be done by the browser itself, without sending it off to some server somewhere. The Media Recorder API is where your immediate need should be met. Otherwise, it would be helpful if you could override the speech recognition via browser plugin. – Brad Dec 15 '13 at 18:59
  • I need to do speech recognition actually :), this is the purpose of this question. Of course Media Recorder would fit more in generic recording and codec issues. – Omar Al-Ithawi Dec 16 '13 at 07:45
  • 1
    @OmarIthawi Yes, I understand you are looking for speech recognition. What I'm saying is that I don't agree that it should be possible to override a specific part of functionality in the speech recognition API. Browsers shouldn't always have to connect to a server to provide speech recognition... they could have the ability to do speech recognition with local software. What *could* be done is overriding speech recognition altogether, providing whatever speech recognition you want to do, which solves your problem. – Brad Dec 16 '13 at 14:22
3

This is not a complete solution, @Brad's answer is actually the correct one at this time.

One way to do it is to compile Opus to Emscripten and hope that your PC can handle encoding using JavaScript. Another alternative is to use speex.js.

Omar Al-Ithawi
  • 4,988
  • 5
  • 36
  • 47