Tesseract OCR Can't create .traineddata

Question

The Problem:

I followed the step by step tutorial provided here to train my tesseract ocr for a new font. But on step 5 and 6 not all needed files are created.

What I did:

My image file is: en.va.exp0.tif

Step 1: Creating the .box file + correcting wrongly identified characters

tesseract en.va.exp0.jpg en.va.exp0 batch.nochop makebox

Step 2: Creating .tr file

tesseract en.va.exp0.tif en.va.exp0 box.train

Step 3: Extracting the charset from the box files

unicharset_extractor  en.va.exp0.box

Step 4: Create font_properties file

echo "va 0 0 1 0 0" > font_properties

Step 5: Training the data

mftraining -F font_properties -U unicharset -O en.unicharset en.va.exp0.tr

Step 6: Training the data

cntraining en.va.exp0.tr

As far as I know step 5 should create 4 files: shapetable, inttemp, pffmtable, normproto. But only the shapetable file is created. Because of that step 6 also doesn't work (it simply does nothing i think)

Materials:

explorer-screenshot-before.jpg

explorer-screenshot-after.jpg

cmd-screenshot.jpg

en.va.exp0.tif

If more explanation or material is needed I'll add it and thanks in advance

I'm facing almost the same issue - `mftraining` runs indefinitely and returns nothing. Did you manage to solve it somehow? — XxX, Aug 04 '22 at 07:40
Sadly no, I didn't manage to fix this :( If you manage to solve it pls let me know though — Der_Floh, Aug 05 '22 at 11:08

score 0 · Answer 1 · answered Nov 19 '22 at 20:10

0

Try running Tesseract 4 instead of Tesseract 5.

answered Nov 19 '22 at 20:10

CrawL

11
1
4

Tesseract OCR Can't create .traineddata

1 Answers1