I use Google speech recognition API. When I'm trying to recognize relatively short words (like "yes" or "no") with duration between 0.25-0.5 seconds, Google API often returns NULL. I tried other input data formats and solution posted here (16-bit PCM, mono input audio file), but it hasn't improved the response. At the same time, recognition on other longer data worked correctly.
I tried to artificially increase the duration of the audio by adding silence before and after the word, so that the audio is not shorter than 5 seconds. The number of unrecognized examples has decreased by 4 times, but it seems to me that unrecognised samples number can still be reduced.
What can be the specificity of the work of Google speech recognition on short duration words?
My code:
credentials = service_account.Credentials.from_service_account_file(‘credentials’)
client = speech.SpeechClient(credentials=credentials)
# Loads the audio into memory
with io.open(nn, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding='FLAC',
language_code='ru-RU',
sample_rate_hertz=16000,
max_alternatives=maxAlternatives)
# Detects speech in the audio file
response = client.recognize(config, audio)
Thank you.