Google Speech-to-text API, InvalidArgument: 400 Must use single channel (mono)

Question

I keep getting this error InvalidArgument: 400 in google Speech-to-text, and the problem seems to be that I an using a 2 channel audio(Stereo), and the API is waiting for a wav in (Mono).

If I convert the file in a audio editor it might work, but I cannot use an audio editor to convert a batch of files. Is there a way to change the Audio type in either Python or Google Cloud.

Note: I already tried with the "wave module" but I kept getting an error #7 for file type not recognize(I couldn't read the wav file with the module wave from Python)

-ERROR- InvalidArgument: 400 Must use single channel (mono) audio, but WAV header indicates 2 channels.

score 20 · Answer 1 · edited Aug 21 '21 at 10:07

20

Assuming you're using the google-cloud-speech library, you could use the audio_channel_count property in your RecognitionConfig and specify the number of channels in the input audio data (it defaults to one channel(mono)). You could do something like this:

from google.cloud import speech

client = speech.SpeechClient()
results = client.recognize(
    audio = speech.types.RecognitionAudio(
        uri = 'gs://your-bucket/recording.wav',
    ),
    config = speech.types.RecognitionConfig(
        encoding = 'LINEAR16',
        language_code = 'en-US',
        sample_rate_hertz = 44100,
        audio_channel_count = 2,
    ),
)

See the API doc for further info.

edited Aug 21 '21 at 10:07

Neuron

5,141
5
38
59

answered Mar 11 '19 at 20:11

LundinCast

9,412
4
36
48

2

I already tried that and I get this error. "InvalidArgument: 400 Invalid recognition 'config': bad channel count." – Jose silvestre Rodriguez Ortiz Mar 12 '19 at 12:50
1

config = speech.types.RecognitionConfig( encoding=speech.enums.RecognitionConfig.AudioEncoding.MULAW, sample_rate_hertz=8000, language_code=first_lang, alternative_language_codes=[second_lang], enable_speaker_diarization=True, audio_channel_count=2, diarization_speaker_count=2, enable_separate_recognition_per_channel=True, enable_word_time_offsets=True, max_alternatives=2) – Jose silvestre Rodriguez Ortiz Mar 12 '19 at 12:52

score 2 · Answer 2 · edited Apr 16 '22 at 01:11

You should use the below function to dynamically return audio channel & frame rate.

It takes the audio file path and returns frame rate and number of channels.

def frame_rate_channel(audio_file_name):
    print(audio_file_name)
    with wave.open(audio_file_name, "rb") as wave_file:
        frame_rate = wave_file.getframerate()
        channels = wave_file.getnchannels()
        return frame_rate,channels

Google Speech-to-text API, InvalidArgument: 400 Must use single channel (mono)

2 Answers2

Linked