I'm trying to get Latin Speech-Recognition for which I'll need, . . . not word-recognition but . . . phonetic-vowel-and-consonant-recognition (since Latin has only 40 sounds, but over 40,000 words x 60 avg. endings = 2.5 MILLION word-forms). The problem is, . . . both the Web Speech API and Google Cloud Speech only begin you with supposedly similar-sounding complete words (and from an English grammar, too, since there are no 2.5 Million-word Latin Grammars out there), and so there's no way for me to get down to processing the actual phonetic sounds, IN PARTICULAR JUST THE WORD-STEM (the first half of the word), which distinguishes each word, rather than the word-ending which uselessly (to me) tells how it's functioning in the sentence. Ideally, I'd want to have a grammar of word-stems such as
- "am-" (short for amo,amare,amavi,amatus, etc.),
- "vid-" (short for video,videre,vidi,visus, etc.),
"laet-" (short for laetus, laeta, laetum, etc.)
etc.
But speech-recognition technology can't search for that.
So where can I get phonetic speech recognition?
I prefer jS, pHp, or Node, and preferably client-side, rather than streaming.
Here's my code so far, for the Web Speech API. The key thing is the console.log()
s which show my trying to dig into each returned possible-word's properties:
speech.onresult = function(event) {
var interim_transcript = '';
var final_transcript = '';
for (var i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
final_transcript += event.results[i][0].transcript;
// This console.log shows all 3 word-guess possibilities.
console.log(event.results[i]);
//These console.logs show each individual possibility:
//console.log('Poss-1:'); console.log(event.results[i][0]);
//console.log('Poss-2:'); console.log(event.results[i][1]);
//console.log('Poss-3:'); console.log(event.results[i][2]);
for (var a in event.results[i]) {
for (var b in event.results[i][a]) {
/*This black-&-yellow console.log below shows me trying to dig into
each returned possibility's PROPERTIES, but alas, the only
returned properties are
(1) the transcript (i.e. the guessed word),
(2) the confidence (i.e. the 0-to-1 likelihood of it being that word)
(3) the prototype
*/
console.log("%c Poss-"+a+" %c "+b+": "+event.results[i][a][b], 'background-color: black; color: yellow; font-size: 14px;', 'background-color: black; color: red; font-size: 14px;');
}
}
}
}
if (action == "start") {
transcription.value += final_transcript;
interim_span.innerHTML = interim_transcript;
}
};