I'm using for the first time the GCS Speech API for a project to convert a series of audio files to text. Each file has around 60 minutes and is a person talking continuously during the whole time. I've installed the GC SDK and I'm using it to perform the requests as shown bellow:
gcloud ml speech recognize-long-running \
"/path/to/file/audio.flac" \
--language-code="pt-PT" --async
Every time I run this on one of my recording, it gives the following error message:
ERROR: (gcloud.ml.speech.recognize-long-running) INVALID_ARGUMENT:
Request payload size exceeds the limit: 10485760 bytes.
It seems to be a very hard restriction because if the API is able to process files up to 180 minutes, there's no way it'll output a maximum of 10,000 characters worth of speech.
I've tried to split the audio files into smaller pieces and reached up to four 15 minute samples and even so I've got the same error. Besides, even if it worked, it would be a very tedious and impractical task to split every new recording I make from here forward.
I've been searching and so far I haven't reached any conclusion about how to increase or circumvent this limitation. I'm on a free trial account but I'm happy to upgrade to a paid subscription to have this limit increased. As far as I understood, this limitation will persist even if I'm on a paid subscription.
Has anyone found any solution for this problem?