Teaching Microsoft.Speech Engine Pronunciation of a Number of none English Words

Question

I am developing a c# application using Kinect that relies on voice input to do things. I have a list of Arabic words that the user can say to select different menu items.

I have been searching over the past few days with little success. Things I found:

CMU Sphinx: http://www.ccse.kfupm.edu.sa/~elshafei/AASR.htm The first problem with this is that it is java based. I have looked at KVM and the bridge one but I couldn't get too far with this thing. I couldn't set it up to work in Java. There are no steps on how to use the already prepared files.

I have also looked at using an SRGSdocument as suggested by this link Specifying a pronunciation of a word in Microsoft Speech API but this is too complicated for my purposes and I don't even know if it is what I need.

I have also looked at Microsoft Speech Recognition Custom Training The person's problem was similar but I cannot solve my problem the same way.

I cannot use a commercial application such as Sakhr because I do not have the budget for it. Simply adding words to a grammar will not work because these words don't obey normal pronunciation rules of the English Language.

Basically, what I'm looking for is some sort of tool that can connect a word written in English with a set of different pronunciations coming from a microphone (as in pretrained) and that then can be referenced by the Speech engine during run time. Is this possible?

I am open to any options.

Thanks.

CMUSphinx is in C, so you can easily use it from your project through Invoke http://stackoverflow.com/questions/9093292/use-a-c-library-from-c-sharp-code. There is a little chance you can train Microsoft engine to recognize arabic with high accuracy. For more details on CMUSphinx read http://cmusphinx.sourceforge.net — Nikolay Shmyrev, Sep 16 '13 at 19:24
I assume you are talking about PocketSphinx? Because Sphinx 4 is in Java. http://cmusphinx.sourceforge.net/wiki/download/ — Hisham Saleh, Sep 16 '13 at 19:51

score 0 · Accepted Answer · answered Sep 24 '13 at 12:26

I think what you want to do is to specify a custom lexicon for your recognizer. That will allow you to "connect a word written in English with a set of different pronunciations" as you said.

The lexicon maps written words to pronunciations written in a phonetic alphabet. You can override the default lexicon (which will have English pronunciations for each word, if you're using an English recognizer) with your own lexicon, either by writing a new lexicon as an XML document, or by specifying individual pronunciations inline.

So you can define the pronunciation of the Arabic word as a sequence of phones (I think you'd have to use only phones that occur in English, otherwise the recognition might not work properly), then link it to the English written word (grapheme) in a lexicon or inline.

This page explains everything: About Lexicons and Phonetic Alphabets (Microsoft.Speech)

This looks like it is the right way to dit. I had actually found the same link a few days ago and am now trying to right up the XML file. I will post the exact steps here as soon as I am finished if anyone else is interested. Thanks for the answer pilikia. — Hisham Saleh, Sep 26 '13 at 13:15
@Hisham Saleh, would love to see your results if you have them — Todd Main, Nov 25 '13 at 06:02

Teaching Microsoft.Speech Engine Pronunciation of a Number of none English Words

1 Answers1