3

Initially i was thinking of writing a speech recognition engine from scratch(with support of 50-100 words)to support my native language.

However after some research it has become clear that it is not possible to make a speech recognition engine even with very limited support in 1.5 years.

Now i was thinking of extending the sphinx engine to support my language. Is it possible to do in 1.5 years or even this is too much work for a final year project.

I am anxiously waiting for your experiences with regards to this matter.

Win Coder
  • 6,628
  • 11
  • 54
  • 81
  • Please read http://stackoverflow.com/a/8215967/432021 – Nikolay Shmyrev Dec 12 '12 at 16:45
  • @NikolayShmyrev thks for the link. However the question still stands. Will a noob programmer be able to acomplish the task in say a year ? – Win Coder Dec 13 '12 at 14:18
  • You will be able to accomplish it in a month – Nikolay Shmyrev Dec 13 '12 at 14:18
  • Just read the tutorial http://cmusphinx.sourceforge.net/wiki/tutorial and go ahead – Nikolay Shmyrev Dec 13 '12 at 14:20
  • Ok so it seems that it possible. However will the amount of work involved be enough to justify a final year project or will it be too little for that ? – Win Coder Dec 13 '12 at 14:23
  • A month of work is obviously not enough to justify a final year project, you are supposed to work a whole year. However, given the value you create from this it's certainly enough. If your language is not supported yet, that would be a great thing to implement. – Nikolay Shmyrev Dec 13 '12 at 14:27
  • @NikolayShmyrev yes my language is not supported. All the material i could find was a conference paper on IEEE explore propsing the use of this engine to support my language. One last question. I am fond of algorithms so will there be heavy use of algorithms in this project ? – Win Coder Dec 13 '12 at 14:43
  • Absolutely, there are dozen of very interesting algorithms in decoding, training and data collection. – Nikolay Shmyrev Dec 13 '12 at 15:42
  • @NikolayShmyrev Thank you really really much. The kind of help you provided is not usually the norm on so and many thanks for answering my noob questions. – Win Coder Dec 13 '12 at 16:04

2 Answers2

2

Time to make a speech recognition will depend of your application.

You will need:

  1. Define what are the words that you want to recognize;
  2. Write a phonetic dictionary for these words;
  3. Record words with several native speakers;
  4. Validate recorded data;
  5. Prepare data to train acoustic models;
  6. Produce grammar or language model (in this case it is necessary to record phonetic balanced words / sentences);
  7. Train acoustic models;
  8. Test your system;
  9. Make adjustments and tuning for grammar and acoustic models (speaker adaptation);
  10. Learn how to make all 9 topics above. :)

Item 10 is the most time consuming task!!!

Answer: Yes, it is possible to make in 3 months for a commercial application.

Sphinx is one possibility, HTK is an excellent open source speech recognition system to train and test a complete system. Julius is an open source speech recognizer (engine) that uses acoustic and language models built with HTK.

Luis Uebel

ASR Labs - www.asrlabs.com.br

Luis Uebel
  • 44
  • 1
  • Thanks of the answer. You said sphnix engine can be extended for a local language in 3 months. Here's another question will the amount of work involved be enough to justify a final year project ? – Win Coder Dec 13 '12 at 14:21
2

Yes, it is certainly possible. I made a similar recognizer for Chatino for my senior thesis. (Chatino is an indigenous language from Oaxaca in southern Mexico). The recognizer includes both an isolated word recognizer and a continuous speech recognizer built using Sphinx4.

See http://www.jaimalayalam.com/papers/chatinoVoiceRecognition09.pdf for details.

vjaivox
  • 41
  • 2
  • thanks for the link. Any pitfalls, recommendations ? Will it be technically feasible for an undergard final year project ? – Win Coder Jan 18 '13 at 08:51
  • It is certainly feasible, since it was my senior year project. It would be necessary to have a good transcript, careful recording in short segments (one recording per line of transcript), and a phonetic dictionary or decomposer for your language. Perhaps you can ask more specific questions once you get into the project. – vjaivox Jan 26 '13 at 02:30