3

I want to convert speech to text using mozilla deepspeech. But the output is really bad.

I have downloaded mozilla's pre trained model and then what i have done is this:



BEAM_WIDTH = 500

LM_WEIGHT = 1.50

VALID_WORD_COUNT_WEIGHT = 2.10

N_FEATURES = 26

N_CONTEXT = 9


ds = Model(model, N_FEATURES, N_CONTEXT, alphabet, BEAM_WIDTH)


fs,audio = wav.read(path)


data = audio[:,0] ## changing to mono channel (using only one channel)

prediction = ds.stt(data,fs)

print(test)

print(prediction)

Now the output is nowhere near to my audio sample. What do i have to do to increase it's accuracy?

Amit Joshi
  • 61
  • 1
  • 1
  • 6

1 Answers1

0

I assume it's because you are not including any LanguageModel.

The pre-trained model is basically just the acoustic model which will only transcribe the audio to similar sounding text that may not make sense.

If you combine the acoustic model with a language model (LM) you will likely get better results.

In your code example I can see the Parameter LM_WEIGHT but not any refenrence to the LM itself.

I'm unsure in which Language you want to integrate deepspeech but here is the example for node-js. This is the part where the LM is integrated

const LM_ALPHA = 0.75;
const LM_BETA = 1.85;
let lmPath = './models/lm.binary';
let triePath = './models/trie';
model.enableDecoderWithLM(lmPath, triePath, LM_ALPHA, LM_BETA);

If I'm not mistaken, the LM & Trie file is included in the pre-trained download ZIP

wget https://github.com/mozilla/DeepSpeech/releases/download/v0.5.1/deepspeech-0.5.1-models.tar.gz

Otherwise you can also create your own Language Model which would make sense if you only need the Model to recognize specific words.

thimos
  • 91
  • 10