What should be the maximum audio file length (duration) to be sent to Bing Speeh to Text API?

Question

I have referred this documentation. They have mentioned when using client libraries for speech to text, "the long audio stream (up to 10 minutes)".

Whether speech to text accepts audio file greater than 10 minutes? What will happen if we pass audio file > 10 minutes?

And in my use case, I need to pass audio file greater than 30 minutes. So what we have to do for these situations?

score 0 · Answer 1 · answered Dec 04 '17 at 21:39

You can split your longer audio streams programmatically using ffmpeg and pass those chunks to this client library. You can check this to programmatically divide long audio streams into time-specified chunks: https://superuser.com/questions/525210/splitting-an-audio-file-into-chunks-of-a-specified-length.

You can then combine your text from these chunks to get the entire text back. Not the cleanest of the ways - but something that will scale.

1 Answers1