9

I can use the en-us things that come with Sphinx4, no problem:

cfg.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us")
cfg.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict")
cfg.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin")

I can use this to transcribe an English sound file recording.

Now I want to use this with German recordings. On the website I find a link to Acoustic and Language Models. In it there is an archive 'German Voxforge'. It it I find the corresponding files for the acoustic model path. But it does not contain a dictionary or language model as far as I can see.

How do I get the dictionary and language model path for German in Sphinx4?

0__
  • 66,707
  • 21
  • 171
  • 266

2 Answers2

6

You create them yourself. You can create language model from subtitles or wikipedia dumps. The documentation is here.

Latest German models are actually not on CMUSphinx page, they are at github/gooofy. In this gooofy project you can find dictionary documentation, models and related matherials.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
  • Sorry. To clarify. From the downloads from gooofy, I am supposed to use `voxforge.dic` (26K words while the en-us has 134K) and `voxforge.lm.DMP`, right? – 0__ Feb 19 '16 at 22:45
  • Yes, they work. The file ending must be changed to lower-case `.dmp` because Sphinx4 only recognises lower-case extensions. – 0__ Feb 19 '16 at 23:10
2

I have tried the German model with pocketsphinx and got some errors due to the "invalid" language model *.lm.bin files were used. I have switched to the *.lm.gz and it working fine.

The proper configuration list is:

  • fst = voxforge-de.fst
  • hmm folder = model_parameters/voxforge.cd_cont_6000
  • dictionary = cmusphinx-voxforge-de.dic
  • language model = cmusphinx-voxforge-de.lm.gz

To get the "hmm" path you should unzip an archive: cmusphinx-de-voxforge-5.2.tar.gz

I think it should be the same for a Sphinx4, so please give it a try.

Ievgen
  • 4,261
  • 7
  • 75
  • 124