11

Is there Anyone that has experience with any open source, or relatively cheap voice recognition API for java? I'm pretty much looking for something that will turn spoken words into text.

From the java speech recognition page on sun, it seems that it is something that is rather dead. My requirements is something that at the least runs on linux.

Can anyone recommend something? Pure java would be a bonus, else a linux based solution could be considered. And since this is a home project... the cheaper the better.

  • Edit

CMU Sphinx As Amit pointed out CMU Sphinx http://cmusphinx.sourceforge.net/html/cmusphinx.php My problem is a massive word error rate. Training seems like a project all in itself, I'm hoping to gather some strength to try it this weekend.

IBM ViaVoice
There are news announcements floating around for 2004 about Via Voice being made open source. It seems the news release was premature and that it never happened. VIA Voice was released for linux at some point, but It seems they stopped. All that seems to be left on IBM's website is ViaVoice embedded.

IBM Websphere Voice
I imagine this is why ViaVoice (desktop) seems discontinued. IBM created this commercial solution which will cost allot more than an arm and a leg. And just using it will take the ones you have left, at least after my experience with websphere and their IDE.

Nuance
It seems they still might create products for linux. But I think they got lost and followed IBM into the server market. I'm not that sure about this one, their web-site is not that friendly in finding useful information.

Open Mind / Free Speech
These guys keep changing their project name. Probably some money hungry company keeps threatening them, but I dont know. The project looks a bit dead.

I might try training Sphinx this weekend to see if it wants to be friends. Else worse case, I'll be looking at using Microsoft's speech solution. It has worked well for me in the past, but it's not a great linux solution. I could probably use it through wine, but then I'll have two separate servers... messy messy.

Oh and what seems a good place to visit for voice/speech SpeechTechMag. They have a 'Anual Reference' that has a list of companies that somehow relates themselves to voice/speech.

guyumu
  • 3,457
  • 2
  • 19
  • 18

5 Answers5

9

Mostly Java: http://cmusphinx.sourceforge.net/html/cmusphinx.php

  • After working with it, it's actually quite horrible. Barely recognizes anything, and it's not like I have a horrid accent or anything. Training it seems even more of a problem and unless you're willing to pocket out for some third party database your sitting with the bottom of the heap. – guyumu Mar 04 '09 at 15:19
  • I haven't had any practical experience with it./ –  Mar 04 '09 at 16:32
  • 2
    This question is pretty old but I would like to tell current performance of Sphinx. I used Sphinx 4 and adapted WSJ model and it gave me 86% accuracy. – Shekhar Mar 26 '13 at 09:27
3

sphinx is by far the best option available if you are on a budget. however it also makes a huge difference what models you use, how you tune them and how you tune your audio source. absolutely everything has to match otherwise it just wont work. given the problem you described id be willing to bet a substantial sum that you've got you got your models mixed up and your mic is not correctly calibrated. also, if you have an accent it probably will not work - this is not an issue with the decoder but with the acoustic models - if no one with a voice/accent similar to yours was included in the training data you'll get poor results.

that said, have you looked at their open source models page?

http://www.speech.cs.cmu.edu/sphinx/models/

depending on what you are trying to do you should be able to obtain about 90% accuracy on free speech with the 16kHz WSJ models and the gigaword LMs NVP. i caution however that ASR is a massive undertaking and hasn't yet reached commodity status.

si28719e
  • 2,135
  • 5
  • 20
  • 22
  • I think I came to that realization, it still has a long road to go. Wether I have an accent or not is subjective :D but likely. Ive recently stopped using ubuntu and jumped onto the windows bandwagon. When I continue with this, I think I will have the capability to use microsoft's engine, which has worked reasonbly in the past. But in the end... I think the technology has far to go, and I think I'll be dropping that part completely for 10 years :) – guyumu Aug 31 '09 at 11:37
  • microsoft's engine also used to be based on sphinx. now i think they perhaps rely more heavily on HTK, another open source speech recognition system. your accent is not a subjective issue from the point of view of an ASR system. the results will be heavily dependent on how well the characteristics of your voice match those of the voices in the training data. differences which may seem trivial to you, for example a canadian versus an american accent, may have a very significant impact on the ASR quality. these days most systems rely on the same algorithms, the difference is the data. – si28719e Nov 24 '09 at 23:28
2

you can download vPass (voice password) from http://www.basic-signalprocessing.com.

The components are designed for Java and .Net language. The recognition period is 5 seconds. VPass is well tested vText is not, still new, that's why not packaged yet.

Matthieu
  • 2,736
  • 4
  • 57
  • 87
Andreas
  • 21
  • 1
1

My group finished a mini program in Java to recognize spoken digits using Sphinx.

Kristian Glass
  • 37,325
  • 7
  • 45
  • 73
Kiet Tran
  • 11
  • 1
1

I have been looking for the same thing for a few days now. So far I have found Sphinx4 and FreeTTS. Both are java implementations and Sphinx seems like it is updated rather frequently unlike FreeTTS. The only problem that I am having is that Sphinx is having problems understanding me in an office environment, and I need a solution for a warehouse environment.

user74339
  • 11
  • 1