3

The Problem:

I followed the step by step tutorial provided here to train my tesseract ocr for a new font. But on step 5 and 6 not all needed files are created.

What I did:

My image file is: en.va.exp0.tif

Step 1: Creating the .box file + correcting wrongly identified characters

tesseract en.va.exp0.jpg en.va.exp0 batch.nochop makebox

Step 2: Creating .tr file

tesseract en.va.exp0.tif en.va.exp0 box.train

Step 3: Extracting the charset from the box files

unicharset_extractor  en.va.exp0.box

Step 4: Create font_properties file

echo "va 0 0 1 0 0" > font_properties

Step 5: Training the data

mftraining -F font_properties -U unicharset -O en.unicharset en.va.exp0.tr

Step 6: Training the data

cntraining en.va.exp0.tr

As far as I know step 5 should create 4 files: shapetable, inttemp, pffmtable, normproto. But only the shapetable file is created. Because of that step 6 also doesn't work (it simply does nothing i think)

Materials:

explorer-screenshot-before.jpg

explorer-screenshot-after.jpg

cmd-screenshot.jpg

en.va.exp0.tif

If more explanation or material is needed I'll add it and thanks in advance

Der_Floh
  • 129
  • 10
  • I'm facing almost the same issue - `mftraining` runs indefinitely and returns nothing. Did you manage to solve it somehow? – XxX Aug 04 '22 at 07:40
  • Sadly no, I didn't manage to fix this :( If you manage to solve it pls let me know though – Der_Floh Aug 05 '22 at 11:08

1 Answers1

0

Try running Tesseract 4 instead of Tesseract 5.

CrawL
  • 11
  • 1
  • 4