I use this formula to get frequency of a signal but I dont understand how to implement code with complex number? There is "i" in formula that relates Math.Sqrt(-1). How can I code this formula to signal in C# with NAduio library?
-
What language would you like to write this in? – christopher Mar 05 '13 at 13:17
-
do u mean programming language? – Cengaver Mar 05 '13 at 13:21
-
Yes, that's what I mean :) – christopher Mar 05 '13 at 13:21
-
c# programming language :) – Cengaver Mar 05 '13 at 13:23
-
Well then my answer is the one for you :) – christopher Mar 05 '13 at 13:28
3 Answers
A lot of languages actually provide Libraries for this that come, built in. One example, in C#.NET, is at this link. This gives you a step by step guide to how to set up a speech recognition program. It also abstracts you away from the low level detail of parsing audio for certain phenomes etc (which frankly is pointless with the amount of libraries there are about, unless you wish to write a highly optimized version).

- 26,815
- 5
- 55
- 89
-
-
Well if you'd mentioned that in the question, I would have made a different suggestion. Is there not a way to can type in the phonetic pronunciation of the Turkish words? – christopher Mar 05 '13 at 13:57
-
-
So erm.. The phonetic prounciaction of "Hello" is.. HEH-LOW. See what I mean? – christopher Mar 05 '13 at 14:08
-
ok i understand. but do i have to use phonetic prounciaction for specific words? my words are specific. i wont recognize rest of them. – Cengaver Mar 05 '13 at 14:16
-
Perhaps. Turkish has sounds that the English language doesn't have. I speak Arabic so I understand your trouble :). I would recommend it, because it will make your software more likely to recognize the word. – christopher Mar 05 '13 at 14:20
-
-
So follow the link I posted. It gives you a really, really good description of what to do. Also, given my answer is the one that has been the most helpful, can you mark it as correct please, for future readers. Thanks. – christopher Mar 05 '13 at 14:37
-
Please mark as the correct answer for future readers; otherwise people will keep adding to this question pointlessly :) – christopher Mar 05 '13 at 15:28
If you want to go back to a basic level then:
You'll want to use some form of probabilistic model, something like a hidden Markov model (HMM). This will allow you to test what the user says to a collection of models, one for each word they are allowed to say.
Additionally you want to transform the audio waveform into something that your program can more easily interpret. Something like a fast Fourier transform (FFT) or a wavelet transform (CWT).
The steps would be:
- Get audio
- Remove background noise
- Transform via FFT or CWT
- Detect peaks and other features of the audio
- Compare these features with your HMMs
- Pick the HMM with the best result about a threshold.
Of course this requires you to previously train the HMMs with the correct words.

- 453
- 1
- 8
- 26
-
-
I code pretty much exclusively in c++ I'm afraid. Step 2 can be pretty simple, you can just threshold the values or do a rolling average and then use that to threshold. Step 3 you can find code for that here: http://stackoverflow.com/questions/170394/fast-fourier-transform-in-c-sharp Step 4 you can once again do thresholding but with a higher value, alternatively just look for the highest value. Other features could be length of the word and frequency of the word. – gpdaniels Mar 05 '13 at 13:56
-
-
I did it by first attempting to segment the audio into when the user is speaking and when they are not. Then take the mean and standard deviation of the audio from when they aren't speaking. Finally I set the threshold at the mean plus one or two times the standard deviation. As an extra step you can then remove small sections of the audio that are above the threshold, basically if it is above the threshold but not long enough to be a word then discard it. – gpdaniels Mar 06 '13 at 11:07
It is a difficult problem nonetheless and you will have to use a ASR framework to do it. I have done something slightly more complex (~100 words) using Sphinx4. You can also use HTK.
In general what you have to do is:
- write down all the words that you want to recognize
- determine the syntax of your commands like (direction) (amount)
Then choose a framework, get an acoustic model, generate a dictionary and a language model compatible with that framework. Then integrate the framework into your application.
I hope I have mentioned all important things you need to do. You can google them separately or go to your chosen framework's tutorial.
Your task is relatively simple in terms of speech recognition and you should get good results if you complete it.

- 21,561
- 9
- 74
- 114