DeepSpeech failed to learn Persian language

Question

I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.

My configurations are as follows:

Batch size = 2 (due to cuda OOM)
Learning rate = 0.0001
Num. neurons = 2048
Num. epochs = 50
Train set size = 7500
Test and Dev sets size = 5000
dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)

Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.

The predictions are all empty strings at the end of the process. Any ideas how to improve the model?

score 3 · Accepted Answer · answered May 15 '21 at 08:12

The Persian dataset in Common Voice has around 280 hours of validated audio, so this should be enough to create a model that has better accuracy than you're reporting.

What would help here is to know what the CER and WER figures are for the model? Being able to see these indicates whether the best course of action lies with the hyperparameters of the acoustic model or with the KenLM language model. The difference is explained here in the testing section of the DeepSpeech PlayBook.

It is also likely you would need to perform transfer learning on the Persian dataset. I am assuming that the Persian dataset is written in Alefbā-ye Fārsi. This means that you need to drop the alphabet layer in order to learn from the English checkpoints (which use Latin script).

More information on how to perform transfer learning is in the DeepSpeech documentation, but essentially, you need to do two things:

Use the --drop_source_layers 3 flag to drop the source layers, to allow for transfer learning from another alphabet
Use the --load_checkpoint_dir deepspeech-data/deepspeech-0.9.3-checkpoint flag to specify where to load checkpoints from on which to perform transfer learning.

score 0 · Answer 2 · answered May 11 '21 at 14:02

0

maybe you need to decrease learning rate or use a learning rate scheduler.

answered May 11 '21 at 14:02

blind_shark

13
5

It's a question rather than an answer. Could be as a comment. – Andrzej Sydor May 11 '21 at 16:26

DeepSpeech failed to learn Persian language

2 Answers2