0

I'm trying to get an audiostream (from any source file/other stream/...) into the microsoft speech recognition engine.

So far I've got:

ffmpeg.exe -rtsp_transport tcp -i rtsp://%_return1%/audio -acodec pcm_u16le -f rtp rtp://localhost:2222

Then I have inside my code:

SpeechRecognitionEngine _engine = new SpeechRecognitionEngine(CultureInfo.CurrentCulture);    
this._engine.SetInputToAudioStream(this._rtpClient.AudioStream, new SpeechAudioFormatInfo(16000, AudioBitsPerSample.Sixteen, AudioChannel.Mono));

Then I have the events registered:

this._engine.SpeechRecognized += this.SpeechRegocnized;

this._engine.SpeechDetected += this.EngineOnSpeechDetected;

I'm not sure about the codec settings... I've tried other codecs but doesn't work.

MrH40XX
  • 35
  • 1
  • 9
  • It should be s16le, not u16le, you need signed PCM. – Nikolay Shmyrev Apr 19 '16 at 08:47
  • Thanks, when I'm home I'll try! – MrH40XX Apr 21 '16 at 07:36
  • Stream #0:0 -> #0:0 (pcm_mulaw (native) -> pcm_s16le (native)) It's not working. Nothing happens, no detection event nothing. When I connect the SpeechRecognitionEngine to my laptop mic it does work. When I play the stream with VLC (RTSP) it do hear the audio stream. – MrH40XX Apr 21 '16 at 19:42
  • Well, are you using the code from this answer? http://stackoverflow.com/a/15934124/432021 In that code, to you start the client? – Nikolay Shmyrev Apr 21 '16 at 20:50
  • Hi, yes I'm using that code! The only thing is that my source is a RTSP audio stream.. while he says he also has that.. the commands supplied to ffmpeg suggest different. I assume that: ecognizer.SetInputToAudioStream( rtpClient.AudioStream, new SpeechAudioFormatInfo(WAVFile.SAMPLE_RATE, AudioBitsPerSample.Sixteen, AudioChannel.Mono)); WAVFile.SAMPLE_RATE would be 8000 in my case. Furthermore: ffmpeg -rtsp_transport tcp -i rtsp://%_return1%/audio -ac 1 -ar 16000 -acodec pcm_s16le -f rtp rtp://127.0.0.1:2222 – MrH40XX Apr 22 '16 at 15:26

0 Answers0