Noise reduction before pocketsphinx reduces recognition accuracy

Question

I am trying to improve the recognition accuracy of pocketsphinx in noisy environments. However the user might use the app in a variable environment. Hence training with noise is not something that I want to do.

My question is , would noise reduction before feeding in the speech signal to pocketsphinx necessarily reduce recognition accuracy?

If yes, what features of speech need to be retained after noise reduction? Currently I observe that the WER goes up from ~40%(free form language) to ~60% if I use noise reduction.

Just to add, the speech does sound better perceptually after noise reduction.

Pocketsphinx argfile:

-lm   lm_giga_64k_vp_3gram.DMP
-dict lm_giga_64k_vp.sphinx.dic 
-hmm  voxforge_en_sphinx.cd_cont_5000

The idea here is to demonstrate increase in speech recognition accuracy with noise reduction enabled and intuitively this should ideally happen unless the noise reduction algorithm is completely messing up the spectral content of the signal.

Any help would be appreciated.

score 5 · Accepted Answer · answered Sep 03 '14 at 11:34

5

Currently I observe that the WER goes up from ~40%(free form language) to ~60% if I use noise reduction.

Those are very bad rates because:

1) You are using outdated models

2) You are using outdated pocketsphinx without noise reduction.

External noise reduction usually degrades speech recognition accuracy, luckily latest pocketsphinx has it's own noise reduction module which makes it quite robust to noise. You just need to update. To get best results you need to:

1) Download and use latest sphinxbase and pocketsphinx from http://github.com/cmusphinx

2) Download latest acoustic and language model:

http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/en-us.tar.gz/download

http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Language%20Model/cmusphinx-5.0-en-us.lm.dmp/download

That would allow you to set a proper baseline. To experiment with noise reduction on and off you can use command line config option:

-remove_noise yes/no

For the further advice on how to reduce the accuracy including the noise-robustness you should better provide a test sample of the audio you want to recognize. See for details:

http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor

answered Sep 03 '14 at 11:34

Nikolay Shmyrev

24,897
5
43
87

Thanks for the quick response Nikolay. I will download the latest source and models and get back with the results. – neeraj baji Sep 04 '14 at 05:56
Also, it would help if you could post a link to a webpage where all the latest info about cmusphinx is maintained. Currently I see several pages on sourceforge but some of them might refer to outdated versions/features. Thanks again. – neeraj baji Sep 04 '14 at 06:09
Nikolay, I ran my test with the latest versions of pocketsphinx and sphinxbase as well as with the latest models. I am still getting a WER of around 39%. To be precise, word_align.pl gives this result: TOTAL Words: 8674 Correct: 5711 Errors: 3457 TOTAL Percent correct = 65.84% Error = 39.85% Accuracy = 60.15% TOTAL Insertions: 494 Deletions: 472 Substitutions: 2491 What am I missing out on? I used the cmu07a.dic provided with pocketsphinx with the new language model. – neeraj baji Sep 05 '14 at 05:35
Please read again **For the further advice on how to reduce the accuracy including the noise-robustness you should better provide a test sample of the audio you want to recognize.** – Nikolay Shmyrev Sep 05 '14 at 05:40
ok. I am testing this on the Vox_forge speech corpus from here - (http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit) - from ** 1snoke-20120412-hge to BlindPilot-20100610-ama **. I am trying to establish a baseline again now that you mentioned that the WER is too high. Once the baseline is established I can quantify the effect of noise and the associated noise reduction on the WER. Needless to add, I am open to testing this with another speech corpus if that aligns better with the latest language model. So it would help if you can specify that as well. – neeraj baji Sep 05 '14 at 07:37
You need to provide all data files and models in a single archive. you need to provide exact command line you are using. Without being able to reproduce your problems it's hard to ehlp you. – Nikolay Shmyrev Sep 05 '14 at 07:49

Noise reduction before pocketsphinx reduces recognition accuracy

1 Answers1