Converting Speech to text using Google Cloud

Question

I am using Google Cloud Platform to convert audio files into text files. While converting a WAV audio file with Mono channel I am getting the error mentioned below:

400 Must use single channel (mono) audio, but WAV header indicates 1 channels.

I tried to solve it but not able to get the solution. Can anybody please help me to solve this?

Please take the time to read the [How to Ask](https://stackoverflow.com/help/how-to-ask), edit your post and add a [min, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) and any errors or logs you might get to assist you further. Regarding your issue please follow this section of the [relevant documentation](https://cloud.google.com/speech-to-text/docs/multi-channel#speech-multi-channel-protocol). You might found this [other post](https://stackoverflow.com/questions/55106509/google-speech-to-text-api-invalidargument-400-must-use-single-channel-mono) useful. — Daniel Ocando, Dec 18 '19 at 10:20
@DanielOcando thanks for your response. The document which you have given for reference, I already considered this but the issue was not resolved. I am working on speech to text and in the audio file there is a lot of background noise is present. I reduced the background noise present in the audio file. After removing the noise, I tried to convert that audio file into text using google speech to text API. While conversion I faced the issue. 'audioChannelCount': 2, 'enableSeparateRecognitionPerChannel': true This change not helping — Mansi Atoliya, Dec 18 '19 at 12:17
audio files that use Speech to Text must be mono single channel. use [this tool](https://audio.online-convert.com/convert-to-wav) to convert your audio (change audio channels to mono) and try to apply Speech to Text again — Methkal Khalawi, Dec 18 '19 at 12:30
It seems like an issue with the audio file used. Note that you have to set the [sampling rate in correspondence with the file format and configurations](https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig#FIELDS.sample_rate_hertz). By default, sample_rate_hertz is equal to 16000. Use the following [tool](https://www.ffmpeg.org/download.html) and post the result of the following command: `ffmpeg -i YOURFILE -hide_banner`. — Daniel Ocando, Dec 18 '19 at 12:30
DanielOcando and Methkal Khalawi Thanks for the support. The audio file is in the WAV format with channel Mono and sample_rate is also used correctly but still facing the same issue — Mansi Atoliya, Dec 18 '19 at 12:43
Have you tested using another file? Can the WAV file be found online for testing purposes? Are you using the [speech.recognize](https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize) API endpoint or any Client Library (e.g. [Python](https://github.com/googleapis/google-cloud-python/tree/master/speech))? Please share the curl, or code used by removing any sensitive information like your API keys so we can take a look. — Daniel Ocando, Dec 18 '19 at 13:56

Converting Speech to text using Google Cloud

0 Answers0