Wildly varying results for foreign language speech recognition across users

Question

I am creating a video game to teach Spanish and Mandarin. It's a speech focused game. I am using Google Chrome's implementation of W3C Speech Recognition because it's free and seemed great.

I've released the game to others and people are having a lot of trouble with the recognition, with the language code set at es-ES. Person A said:

My favorite so far: "refrescos" turns into "Craigslist jobs"! :D
And then "Netflix codes".

If the language code is set to es-ES, how could "jobs" be interpreted (I could understand a proper noun like "Netflix" would be, but not "jobs")

Person B said lo siento was transcribing as lo síento, while person C said it transcribed as lo siento. How could the same API interpret the same spoken word with one having an accent mark and one without?

Each has Chrome 76.

My questions are:

I'm wondering how there could be such drastic Speech Recognition differences across machines for the same Chrome browser version
I was avoiding Google Cloud Speech API because it costs money, but has anyone had more consistent luck across languages with this API? Moreover, this SO post says the API is really only better if you need longer audio transcriptions, which I do not need
Are there any other really great foreign language speech recognition APIs that won't break the bank?

score 0 · Answer 1 · answered Sep 10 '19 at 04:29

Web speech API is designed for simple voice commands and phrases. It is not the best quality recognition engine. It is also dependent on the person's accent. The other SO question you mentioned also says the web speech is designed for commands. It's not best for teaching languages since a user will likely have a very strong and non-native accent in at least one language.

Wildly varying results for foreign language speech recognition across users

1 Answers1