6

I am using Google Cloud Platform Speech-to-Text API trial account service. I am not able to get text from an audio file. I do not know what exact encoding and sample Rate Hertz I should use for MP3 file of bit rate 128kbps. I tried various options but I am not getting the transcription.

const speech = require('@google-cloud/speech');

const config = {
  encoding: 'LINEAR16',  //AMR, AMR_WB, LINEAR16(for wav)
  sampleRateHertz: 16000,  //16000 giving blank result.
  languageCode: 'en-US'
};
Grokify
  • 15,092
  • 6
  • 60
  • 81
Vikash Patel
  • 61
  • 1
  • 3

4 Answers4

8

MP3 is now supported in beta:

MP3 Only available as beta. See RecognitionConfig reference for details.

MP3 MP3 audio. Support all standard MP3 bitrates (which range from 32-320 kbps). When using this encoding, sampleRateHertz can be optionally unset if not known.

You can find out the sample rate using a variety of tools such as iTunes. CD-quality audio uses a sample rate of 44100 Hertz. Read more here:

To use this in a Google SDK, you may need to use one of the beta SDKs that defines this. Here is the constant from the Go Beta SDK:

RecognitionConfig_MP3 RecognitionConfig_AudioEncoding = 8
Grokify
  • 15,092
  • 6
  • 60
  • 81
  • 1
    I used the betaversion for a mp3 file whose sample rate is 44100 Hz(found it using sox)...but if i use it the api translates only the first word...whereas if i use sample rate as 8000...the api translates properly...no such issue when i use with azure speech to text API – Nitin Jun 03 '20 at 05:59
3

According to the official documentation (https://cloud.google.com/speech-to-text/docs/encoding),

Only the following formats are supported:

  • FLAC
  • LINEAR16
  • MULAW
  • AMR
  • AMR_WB
  • OGG_OPUS
  • SPEEX_WITH_HEADER_BYTE

Anything else will be rejected.

Your best bet is to convert the MP3 file to either:

Honestly it is annoying that Google does not support MP3 from the get-go compared to Amazon, IBM and Microsoft who do as it forces us to jump through hoops and also increase the bandwidth usage since FLAC and LINEAR16 are lossless and therefore much bigger to transmit.

Pic Mickael
  • 1,244
  • 19
  • 36
  • What is the format to get text from m4a file. I am using below but it FAILED and returning empty result. {@"encoding":@"MULAW", @"sampleRateHertz":@(16000), @"languageCode":@"en-IN", @"maxAlternatives":@30} – CrazyPro007 May 01 '19 at 07:36
  • URL is NSString *service = @"https://speech.googleapis.com/v1/speech:recognize"; – CrazyPro007 May 01 '19 at 08:41
2

I had the same issue and resolved it by converting it to FLAC.

Try converting your audio to FLAC and use

encoding: 'FLAC',

For conversion, you can use sox ref: https://www.npmjs.com/package/sox

Rejo Chandran
  • 599
  • 4
  • 22
0

now, the mp3 type for spedch-to-text,only available in module speech_v1p1beta1 ,you must post your request for this module,and you will get what you want. the encoding: 'MP3' python example like this:

from google.cloud import speech_v1p1beta1 as speech
import io
import base64

client = speech.SpeechClient()
speech_file = "your mp3 file path"
with io.open(speech_file, "rb") as audio_file:
    content = (audio_file.read())

audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.MP3,
    sample_rate_hertz=44100,
    language_code="en-US",
)

response = client.recognize(config=config, audio=audio)

# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
print(response)
for result in response.results:
    # The first alternative is the most likely one for this portion.
    print(u"Transcript: {}".format(result.alternatives[0].transcript))

result

bob tian
  • 1
  • 1
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 27 '22 at 00:54