I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.
My configurations are as follows:
- Batch size = 2 (due to cuda OOM)
- Learning rate = 0.0001
- Num. neurons = 2048
- Num. epochs = 50
- Train set size = 7500
- Test and Dev sets size = 5000
- dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)
Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.
The predictions are all empty strings at the end of the process. Any ideas how to improve the model?