Google Speech API Single Utterance

Question

How does Google Speech API's SingleUtterance work? According to the docs, it is Google's way of determining when a speaker has spoken a single utterance. I understand what it does, but I would like to know how? Does the API simply wait for a certain duration of "speechless" audio? If so, how long a duration of voiceless audio will trigger the end of an utterance?

Does it have some other sort of AI algorithm that helps determine when someone has stopped speaking?

Thanks

I would suggest removing the C# tag here - it won't matter which language you happen to use to talk to the Speech API, and the tag may be off-putting to other users who expect the question *would* be C#-specific. — Jon Skeet, Sep 12 '18 at 16:12

score 2 · Answer 1 · answered Oct 04 '18 at 01:08

I don't think details are exposed, in my opinion detection of audio ending is a decision of the API. Instead, the it offers the way to identify when such decision has been made.

In normal conditions the stream will continue to listen and process audio until either the stream is closed directly, or the stream's limit length has been exceeded. In such situation single_utterance is not required to be set.

When you require it (voice commands, for example) and set single_utterance=true, the API decides when to finish recognition and sends to your client the END_OF_SINGLE_UTTERANCE event and cease recognition.

Google Speech API Single Utterance

1 Answers1