0

I want to use the Microsoft.Speech namespace in VB.NET to create a telephony application. I need to be able to set the recognizer input to any audio device installed on the system. Microsoft has the recognizer.SetInputToDefaultAudioDevice() method, but I need something like .SetInputToAudioDeviceID. How can I choose another wave audio input from the list of devices installed on my system? In SAPI, I would use MMSystem and SpVoice:

Set MMSysAudioIn1 = New SpMMAudioIn   
MMSysAudioIn1.DeviceId = WindowsAudioDeviceID  'set audio input to audio device Id
MMSysAudioIn1.Format.Type = SAFT11kHz8BitMono  'set wave format, change to 8kHz, 16bit mono for other devices
Dim fmt As New SpeechAudioFormatInfo(1000, AudioBitsPerSample.Eight, AudioChannel.Mono)
Recognizer.SetInputToAudioStream(MMSysAudioIN1, fmt)

How do I do this with Microsoft.Speech?

MORE INFO: I want to take any wave input device in the Windows list of wave drivers and us that as input to speech recognition. Specifically, I may have a Dialogic card with wave input reported by TAPI as deviceID 1-4. In SAPI, I can use the SpMMAudioIn class to create a stream and set which device ID is associated with that stream. You can see some of that code above. Can I directly set Recognizer1.SetInputToAudioStream by the device ID of the device like I can in SAPI? Or do I have to create code that reads bytes and uses buffers, etc. Do I have to create a MemoryStream Object? I can't find any example code anywhere. What do I have to check in .NET to get access to ISpeechMMSysAudio/spMMAudioIn in case something like this would work? But hopefully, there is a way to use MemoryStream or something like it that takes a device ID and lets me pass that stream to the recognizer.

NOTE 2: I added "imports Speechlib" to the VB project and then tried to run the following code. It gives the error listed in the comments below about not being able to set the audio stream to a COM object.

Dim sre As New SpeechRecognitionEngine
Dim fmt As New SpeechAudioFormatInfo(8000, AudioBitsPerSample.Sixteen, AudioChannel.Mono)
Dim audiosource As ISpeechMMSysAudio
audiosource = New SpMMAudioIn
audiosource.DeviceId = WindowsAudioDeviceID  'set audio input to audio device Id
' audiosource.Format.Type = SpeechAudioFormatType.SAFT11kHz16BitMono 
sre.SetInputToAudioStream(audiosource, fmt) <----- Invalid Cast with COM here

It also appears the SpeechAudioFormatType does not support 8kHz formats. This just gets more and more complicated.

FDecker
  • 41
  • 1
  • 10

1 Answers1

0

You would use SpeechRecognitionEngine.SetInputToAudioStream. Note that if you're having problems with streaming input, you may need to wrap the stream, as illustrated here.

Community
  • 1
  • 1
Eric Brown
  • 13,774
  • 7
  • 30
  • 71
  • Sorry Eric. I am speaking to you in two threads. If I can do this with one line of code to set it to the default audio device, why can't I set it to the inpu of ANY audio device? I am still unsure how to do this. I can't find any VB.NET code (I may be able to convert C# if I could find that) that shows how to take the device ID of wave input of a modem or dialogic card, PBX line, etc. and have speech recognition use that input. There are no examples of how to use .SetInputToAudioStream that I can use. – FDecker Apr 07 '17 at 12:33
  • @Fred In System.Speech, you can set to the default audio device (and that's often what you want), but in Microsoft.Speech, you rarely want to do that; typically you want to connect to some arbitrary device (modem, dialogic card, network), and the only real common denominator for these arbitrary devices is a byte stream, which is best described by System.IO.Stream. – Eric Brown Apr 07 '17 at 16:09
  • Exactly! In SAPI, I can directly set the speechrec input to the Device ID of the wave device with spMMAudioIn. But I still can't figure out how to do it with a device like a Dialogic card and microsoft.speech. TAPI will report the device ID for the line I want (lineGetID gives the wave/in/out class and fills DeviceInID/DeviceOutID). A control I have uses MMSYSTEM's WaveInOpen() and sets uDeviceID to the DeviceInID I get from TAPI for recording. So when I am on a call, I want to send the audio input from the call to the recognizer. How? I am reading up on System.IO.Stream, but don't get it. – FDecker Apr 12 '17 at 20:14
  • I changed the question a bit above, including different code and more detail to my question. Thanks! – FDecker Apr 12 '17 at 20:38
  • I did some research and if you can create an SpMMAudioIn object, you can pass that object to SetInputToAudioStream. – Eric Brown Apr 12 '17 at 20:42
  • I changed the code as listed above and now get: Unable to cast COM object of type 'System.__ComObject' to class type 'System.IO.Stream'. Instances of types that represent COM components cannot be cast to types that do not represent COM components; however they can be cast to interfaces as long as the underlying COM component supports QueryInterface calls for the IID of the interface. – FDecker Apr 13 '17 at 18:30
  • Is there any more help you can provide? Anyone? If speech is designed for Telephony, then why aren't there examples connecting to a multi-line telephony device like a voicemodem, Dialogic Card, SIP, etc? What is wrong with my stream? – FDecker Jun 22 '17 at 16:44