0

I have a python speech recognition assistant and it plays mp3 audios it downloads. I have put playing the mp3 on a separate thread in the background.

The issue is that the speech recognition tries to detect what the mps audio is saying and it responds.

How can I make the speech recognition be silent until I give specific speech to wake it up?

Here is my function file for playing and retrieving the mp3:

def play_quran():
    speak("Ready to play Quran. Tell me which Surah number you want to hear.")
    #qari_num = input("Enter Surah Number: ")
    qari_num = recordAudio()
    url = ("https://api.quran.com/api/v4/chapter_recitations/9/" + str(qari_num))
    print(url)
    response = requests.get(url)
    my_dictionary = requests.get(url).json()
    rdata = response.json()
    print(json.dumps(my_dictionary, indent=4))
    surah_to_play = (my_dictionary['audio_file']['audio_url'])
    print(surah_to_play)
    response = request.urlretrieve(surah_to_play, qari_num + ".mp3")
    os.system("mpg123 -q " + qari_num + ".mp3")
    stop_listening = sr.Recognizer().listen_in_background(sr.Microphone(), recordAudio)
#    time.sleep(2)
#    exit()

Here is the code that calls the function above:

if "play Quran" in data:
    speak("opening Quran. One moment please")
    t = threading.Thread(
        target=play_quran)  # < Note that I did not actually call the function, but instead sent it as a parameter
    t.daemon = True
    t.start()  # < This actually starts the thread execution in the background

Thanks.

ironmantis7x
  • 807
  • 2
  • 23
  • 58
  • I suspect that your audio source is close to your microphone. i.e. the coupling happens externally, not due to sharing the same audio stream internally, correct? If so, how about the following mitigations: – Bilal Qandeel Sep 03 '22 at 20:32
  • 1
    YOUR SOLUTION: keep the speech rec running, yet let it keep ignoring whatever it catches until your activation phrase is activated. SOME OTHER SOLUTION: try to decrease the sensitivity of the mic until it picks up a volume level higher than the playing audio's average. You can gradually decrease the playback volume to say 25% within a second to allow the speech rec to do its job. Then restore it back. FANCY SOLUTION: use an echo cancelling technique to subtract the levels of audio from the `MP3` from the input audio of the mic (there is usually a delay between the two). – Bilal Qandeel Sep 03 '22 at 20:38
  • Tell me your choice so we can brainstorm a solution. Taqaballahu brother ;-) – Bilal Qandeel Sep 03 '22 at 20:38
  • 1
    FANCY solution seems the only robust option here – SystemSigma_ Sep 07 '22 at 09:41
  • 1
    @BilalQandeel Jazak Allahu Kheiran. Let's try #1 SOME OTHER SOLUTION and as a back up #2 YOUR SOLUTION – ironmantis7x Sep 08 '22 at 05:38

1 Answers1

1

TLDR; Google's algorithm goes over audio to be disabled at specific times where it would find the woke word.

Essentially the fancy solution mentioned is the way to go. The process that google uses for example (check the US patent here).

  1. Start
  2. Receive data representing audio content for playback
  3. Detect, in the audio content, one or more wake words
  4. Cause one or more NDMS (networked microphone devices) to be disable its respective wake response to the detected one or more wake words during playing of the audio content
  5. Play back the audio content
  6. End

This is all done before the device starts playing the content! as to know when to disable the wake word detection. This algorithm they use is mainly for larger systems, where you would probably disable only one microphone.

So the final product would know timing to turn off the detection of the wake up word based on the played audio from the same device.

if check_if_microphone_enabled(time_variable) and ("play Quran" in data):

There were a few questions related to listening to other applications audio streams, but didn't find a easy solution, as it highly depends on the platform (OS) and software used to playback. The easiest solution that I see if you can control also the playback of the audio (with pygame or other library), you already have access to the device audio that way.

As you would have access to the audio played (your assistant plays the music, I don't see this to be a problem).

Warkaz
  • 845
  • 6
  • 18