Problem with reading entire audio in SpeechRecognition because of silent moments

Question

I'm having problems with transcribing an entire audio with SpeechRecognition using Google recognizer API. Even though my audio is correctly read, just the first sentence of it is detected and transcribed. That's because my audio file has many "silent seconds" in it, and I'm guessing the algorithm is detecting the first of them as the end of my audio and interrupting the transcription.

To solve this, I've tried to use energy_threshold and pause_threshold parameters and they seem to be making no difference (I've checked many different values for both).

Does anyone know how to correctly adjust the period time SpeechRecognition waits (and not considers as the end of the audio)?

r = sr.Recognizer()
gravacao = sr.AudioFile('my_audio.wav')
    
with gravacao as source:
    r.pause_threshold = 10 #Represents the minimum length of silence (in seconds) that will register as the end of a phrase.
    r.energy_threshold = 40 #Represents the energy level threshold for sounds. Values below this threshold are considered silence. Can be changed.
    r.dynamic_energy_threshold = True 

    audio = r.record(source)    
    
lang = "pt-BR"

try:
    pre_frase = r.recognize_google(audio, language = lang)
    print(pre_frase)

except Exception as exp:
    print("Error: {}".format(exp))

Problem with reading entire audio in SpeechRecognition because of silent moments

0 Answers0