CMU Sphinx with VoxForge has total failure to recognize words - why?

Question

I am trying to setup VoxForge 0.4 English acoustic model - as described in https://stackoverflow.com/a/8699337/519995 (but adapted to Raw configuration and not XML). When I switched to VoxForge my error rate went up to 100% !

I get results that do not resemble the input sounds at all.

I guess I configured something wrong, but I can't figure out what.

Below are the modifications I made (starting from RawHelloNGram.java demo).

When VOX_FORGE is false everything works decently, when it is true everything fails to recognize.

this.modelLoader = new Sphinx3Loader(
            VOX_FORGE ? 
                  "file:"+PROJECT_DIR+"/voxforge-en-0.4/model_parameters/voxforge_en_sphinx.cd_cont_5000"
                : "resource:/WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz",
            "mdef",
            "",
            logMath,
            unitManager,
            0.0f,
            1e-7f,
            0.0001f,
            true);

    this.model = new TiedStateAcousticModel(modelLoader, unitManager, true);


// changed parameters of mel-Filter
 this.melFilterBank = new MelFrequencyFilterBank(
        VOX_FORGE ? 200.0  : 130.0,     // minFreq,
        VOX_FORGE ? 3500.0 : 6800.0,    // maxFreq,
        VOX_FORGE ? 31     : 40         // numberFilters
    );

 if (VOX_FORGE) {
    this.featureTransform = new FeatureTransform(
                modelLoader
    );
 }

...
... later at the end of the pipeline setup
if (VOX_FORGE) {
    pipeline.add(featureTransform);
}

For completeness - this is the entire configuration I'm using: https://gist.github.com/Iftahh/7336283

score 3 · Accepted Answer · answered Nov 06 '13 at 21:06

3

Voxforge uses standard mel filterbank parameters (see feat.params).

-nfilt 40
-lowerf 133.333334
-upperf 6855.4976

There is no need to set melfilterbank to 200/3500/31

answered Nov 06 '13 at 21:06

Nikolay Shmyrev

24,897
5
43
87

thank you Nikolay, this certainly improved matters. I edited the mel parameters as recommended according to the test application sphinx4/tests/performance/voxforge_en/voxforge.config.xml which I guess correspond to another version of Voxforge – Iftah Nov 07 '13 at 02:58
while now the transcribed text does match the input sounds, the error rate is significantly *higher* than without using VoxForge - is there some other configuration value I should fix? – Iftah Nov 07 '13 at 03:00
To get help on accuracy you need to provide the test data you are using to calculate WER, see for details http://cmusphinx.sourceforge.net/wiki/faq. Generally voxforge model is not very good, hub4 model is better and modern generic en-us is better too, but it's not easy to plug it. – Nikolay Shmyrev Nov 07 '13 at 07:26
I read a blog that tested and found Voxforge to be the better than hub4: http://grasch.net/node/19 but I guess it depends on the test inputs. Thanks for your help! – Iftah Nov 07 '13 at 11:33

CMU Sphinx with VoxForge has total failure to recognize words - why?

1 Answers1