6

The problem is that I want to get phonemes of a audio speech in C# language. say you have an audio file like "x.wav" that says "hello dear Shamim". i want to extract all the phonemes of the speech and their relative timings. something like the picture below:

Phoneme Editor

I used System.Speech library (both recognition and synthesis namespaces) but i didn't find what i wanted. Now don't be mistaken! I don't want the phonemes of the sentence "hello dear Shamim", i want to extract the phonemes from an unknown audio input that speaks and English sentence. I tried System.Speech.Recognition but it tries to extract the words out of the audio file, not the phonems! and as you may guessed, the words are 30% wrong! ;)

K DawG
  • 13,287
  • 9
  • 35
  • 66
Shamim
  • 434
  • 4
  • 11

4 Answers4

3

Phoneme recognition requires a bit of a specialized set-up compared to word recognition, and most engines don't support it directly (a dictionary of monophonic "words" doesn't usually result in good accuracy). A big reason for that is that phoneme recognition is much less accurate than word recognition, since word recognition is more constrained (it filters out all phone combinations which don't map to real words, which is most of them). But HTK does support it. You can use it by executing shell commands (there's nothing evil in doing that from C#) or pinvoking the libraries.

Aleksandr Dubinsky
  • 22,436
  • 15
  • 82
  • 99
  • Thanks. :) the previous answer did mention the HTK too. I'm reading the documents. :P – Shamim Dec 26 '13 at 09:24
  • MS actually bought HTK. HTK is used a lot in research, so it can do many low-level things. Can I ask what the purpose of this is? – Aleksandr Dubinsky Dec 26 '13 at 11:05
  • Yes, next phase is to show continuous pictures of lips with respect to its phonemes at the specific timing. that's all! a simple school project! – Shamim Dec 26 '13 at 12:06
2

Try using the System.Speech.Recognition.DictationGrammar constructor that takes a string argument, and pass "grammar:dictation#pronunciation" as the argument. Alternatively, raw SAPI (using the SpeechLib interop assembly) can specify the pronunciation grammar via ISpRecoGrammar::LoadDictation and specifying "Pronunciation" as the dictation topic.

Eric Brown
  • 13,774
  • 7
  • 30
  • 71
  • thanks for your answer. :) what do you mean by "grammar:dictation#pronunciation" constructor? I'll try the SpeechLib and will give some feedback later. Thanks! – Shamim Dec 26 '13 at 09:36
  • I mean the DictationGrammar constructor that takes a string argument. Edited answer for better clarity. – Eric Brown Dec 26 '13 at 09:49
  • Interesting! Do you have documentation for this option? – Aleksandr Dubinsky Dec 26 '13 at 11:03
  • More from internal lore, sorry. I suspect that it's because this topic isn't available in all languages or in 3rd party engines (yes, they exist). – Eric Brown Dec 26 '13 at 20:06
1

You can bind Hidden Markov Model Tool Kit with pinvoke to your c# code or try to use Accord.net framework, which is managed and has HMM classes, but no concrete methods to extract phonemes.

Redwan
  • 738
  • 9
  • 28
1

Is this for vanilla .net, or can you use SAPI (you know, speech API)? The speech api is nice, and it seems to have what you are looking for. Most of all, in a windows environment, it is easily obtained than the external libraries(not to mention that there is not much of a licencing issue regardless of application).

Did you notice System.Speech.Recognition.RecognizedWordUnit? That seems to be roughly what you are looking for.

violet_white
  • 404
  • 2
  • 15
  • I searched for SAPI and come across free `Microsoft Speech SDK 5.1`! I downloaded it and installed it. but i can't figure it out yet! the link you provided is `System.Speech.Recognition` which i already tried to work with, but as i said, it will first recognize the words (or actually guess the words) and then gives me the phonemes of the words. also there is no timing! I tried to use a `RecognizedWordUnit` class, but it seems that it will work on a recognized word! I don't want the words to be recognized, I don't care about them! I want the phonemes of the speech! Thanks for your answer! :) – Shamim Dec 26 '13 at 09:18
  • Is the dot Pronunciation not enough? It returns a phonetic spelling. – violet_white Apr 30 '15 at 16:19
  • what is dot Pronunciation? anyhow! I used sapi and some .exe file to get the Phonemes, as Aleksandr Dubinsky and Redwan mentioned above. your answer may not be exact thing that i did, but it's also close. – Shamim May 03 '15 at 07:10