5

i am developing a desktop application using java. this application is for school kid to teach English, where user can upload some English audio can be in any format which need to be converted into text file. where they can read the text.

I've found some api but i am not sure about them.

http://cmusphinx.sourceforge.net/wiki/

I've seen many question on stackoverflow regarding this but none was helpful. if someone can help on this will be very greatful

thank you

Marcelo
  • 4,580
  • 7
  • 29
  • 46
Yashprit
  • 512
  • 1
  • 8
  • 22
  • 4
    Just to let you know, what you are trying to achieve is not trivial - and there's probably no a solution out there that'll grant you a 100% conversion... the other way around (text2speech) is much easier. You might want to look around 3rd-party apps/libs that do it, not necessarily in Java, and then just integrate with them. – Marcelo Mar 05 '12 at 15:33

2 Answers2

3

What you seek is currently breaking edge technology. Tools like cmusphinx can detect words from a dedicated, limited dictionary (so you can teach it to understand, say, 15 words and that's it - you can't teach it to understand English).

Basically, those tools try to find patterns in the sound waves that you feed them. They don't understand anything, they just use the same algorithm on anything and then try to find the closest match. This works well for small sets of words but as the number of words increases, the difference between then shrinks and the jobs gets ever harder (without even starting with words like whether and weather or C and see).

What you might consider is "repeat after me" software. Here, you need to record all words for the test as templates. Then you can record the words from the pupils and then compute the difference. If the difference is not too large, the word is correct. But again: This is simple repetition to improve pronunciation - not English.

There is desktop software which can understand a lot of English (for example the products from Nuance, Dragon Naturally Speaking being one of the most prominent). They do offer server solutions but that software isn't free or cheap if you're on a tight budget.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
3

There are many technologies and services available to perform speech recognition. For an intro to some of the choices see https://stackoverflow.com/a/6351055/90236.

I'm not sure that the results will be acceptable for teaching children English as a second language, but it is worth trying.

Community
  • 1
  • 1
Michael Levy
  • 13,097
  • 15
  • 66
  • 100