We give this image to our users:
This picture is representing separate numbers. And all of our users read it as "11-0-9-5" into their microphones.
We use Google Speech Engine, and it interprets this result:
"1109 5".
This makes it impossible for us to compare the spoken words with the expected result. And we're stuck in this phase.
Is there a way to tell Google's Speech Recognition to understand spoken numbers literally and separately, and do not join them together?