Transcribe System Audio using SpeechRecognition

Question

Trying to transcribe system audio (Youtube Stream, Twitch Stream, Spotify, etc.)

The code is simple and works fine with my external microphone, but I'm having trouble getting it to go off of the system audio. I've determined the correct Device_Index matching it up with Audacity, but when it's run with that channel I get the error "OSError: [Errno -9998] Invalid number of channels".

My regular mic works: 2 Microphone Array (Realtek Audio, MME (2 in, 0 out)

What I thought was system audio does not: 13 Speakers (DisplayLink Audio), Windows WASAPI (0 in, 2 out)

I'm guessing I'm just on the wrong track. If anybody could nudge me in the right direction, I'd appreciate it.

import speech_recognition as sr  

r = sr.Recognizer()                                                                                   
with sr.Microphone(device_index=25) as source:                                                                       
    print("Speak:")                                                                                   
    audio = r.listen(source)   

try:
    print("\"" + r.recognize_google(audio)+"\"")
except sr.UnknownValueError:
    print("Could not understand audio")
except sr.RequestError as e:
    print("Could not request results; {0}".format(e))

Entire Device Index:

   0 Microsoft Sound Mapper - Input, MME (2 in, 0 out)
>  1 Stereo Mix (Realtek Audio), MME (2 in, 0 out)
   2 Microphone Array (Realtek Audio, MME (2 in, 0 out) ---WORKS---
   3 Microsoft Sound Mapper - Output, MME (0 in, 2 out)
<  4 Speakers (DisplayLink Audio), MME (0 in, 2 out)
   5 Speakers / Headphones (Realtek , MME (0 in, 2 out)
   6 Primary Sound Capture Driver, Windows DirectSound (2 in, 0 out)
   7 Stereo Mix (Realtek Audio), Windows DirectSound (2 in, 0 out)
   8 Microphone Array (Realtek Audio), Windows DirectSound (2 in, 0 out)
   9 Primary Sound Driver, Windows DirectSound (0 in, 2 out)
  10 Speakers (DisplayLink Audio), Windows DirectSound (0 in, 2 out)
  11 Speakers / Headphones (Realtek Audio), Windows DirectSound (0 in, 2 out)
  12 Realtek ASIO, ASIO (2 in, 2 out)
  13 Speakers (DisplayLink Audio), Windows WASAPI (0 in, 2 out)
  14 Speakers / Headphones (Realtek Audio), Windows WASAPI (0 in, 2 out)
  15 Stereo Mix (Realtek Audio), Windows WASAPI (2 in, 0 out) ---COULD NOT UNDERSTAND AUDIO---
  16 Microphone Array (Realtek Audio), Windows WASAPI (2 in, 0 out) ---WORKS---
  17 Headphones (), Windows WDM-KS (0 in, 2 out)
  18 Microphone Array (Realtek HD Audio Mic Array input), Windows WDM-KS (2 in, 0 out)
  19 Stereo Mix (Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out)
  20 Speakers (Realtek HD Audio output with SST), Windows WDM-KS (0 in, 2 out)
  21 Jack Mic (Realtek HD Audio Front Mic input), Windows WDM-KS (2 in, 0 out)
  22 Speakers (DisplayLink Audio), Windows WDM-KS (0 in, 6 out)
  23 Microphone (DisplayLink Audio), Windows WDM-KS (2 in, 0 out)
  24 Headset (@System32\drivers\bthhfenum.sys,#2;%1 Hands-Free AG Audio%0

25 is probably not the right device index (it is usually 1-2-3-4). You'd better list the pyaudio devices first as in https://stackoverflow.com/questions/36894315/how-to-select-a-specific-input-device-with-pyaudio — Nikolay Shmyrev, Jul 27 '20 at 18:57
Main question https://stackoverflow.com/questions/50952667/python-speech-recognition-error-invalid-number-of-channels — Nikolay Shmyrev, Jul 27 '20 at 19:25
Yes, I have checked all my available devices. #2 is for my headset mic and #16 is for my integrated laptop mic. I thought #13 would be the correct device since that is what Audacity also uses to pick up playback audio. I edited the post to show the entire index. — Phil, Jul 27 '20 at 20:46
Stereo Mix (#1, 7, 15, 19) should work if you enabled it in drivers. — Nikolay Shmyrev, Jul 27 '20 at 23:05
None of those work unfortunately, only my integrated laptop mic and headset will pick up audio. — Phil, Jul 29 '20 at 14:15

Transcribe System Audio using SpeechRecognition

0 Answers0