I want to make a simple game which compares the pronunciation of a given word, which is provided as audio file, with the same word pronounced by the player, via a microphone. By pronounciation I mean that the "sound" of the word should be compared to the given word.
It would be ideal if the system would give back a percentage of how close the player pronounced the word to the given word.
I've found questions in StackOverflow about audio fingerprinting and speech-recognition. They seem to indicate, that its a very hard problem. But as I don't need full speech recognition maybe there is a simpler approach which I missed.
So my questions are then: Is that even feasible? If it is feasible, how could I approach the problem? Are there libraries which could support my.