When mftraining is executed on my training files, I get the following error message:
PS > mftraining -F font_properties -U unicharset -O lang.unicharset .\eng.ds-digita
l.exp0.box.tr .\eng.ds-digitalb.exp0.box.tr .\eng.ds-digitali.exp0.box.tr
Warning: No shape table file present: shapetable
Reading .\eng.ds-digital.exp0.box.tr ...
Reading .\eng.ds-digitalb.exp0.box.tr ...
Reading .\eng.ds-digitali.exp0.box.tr ...
Font id = -1/0, class id = 1/12 on sample 0
font_id >= 0 && font_id < font_id_map_.SparseSize():Error:Assert failed:in file ..\..\classify\trainingsampleset.cpp, li
ne 622
A dialog from Windows also appears stating "feature training for Tesseract has stopped working". There are several posts around the net adressing this issue, but none of them (That I have tried so far) seems have any solutions to make my data-set go through.
The folder where the mftraining command is executed at contains the following files:
eng.ds-digital.exp0.box
eng.ds-digital.exp0.box.tr
eng.ds-digital.exp0.box.txt
eng.ds-digital.exp0.tif
eng.ds-digitalb.exp0.box
eng.ds-digitalb.exp0.box.tr
eng.ds-digitalb.exp0.box.txt
eng.ds-digitalb.exp0.tif
eng.ds-digitali.exp0.box
eng.ds-digitali.exp0.box.tr
eng.ds-digitali.exp0.box.txt
eng.ds-digitali.exp0.tif
font_properties
unicharset
And the font_properties has the following content (It also ends with a newline as the documentation states):
ds-digital 0 0 0 0 0
ds-digitalb 0 1 0 0 0
ds-digitali 1 0 0 0 0
I've also tried different naming conventions on the font-name on the font_properties (althought the documentation is quite clear it is the font name of the file and not the file name, but some people around the net seems to claim otherwise), and renaming the files so the .tr-files follows the pattern eng.ds-digital*.exp0.tr without anvil.
Edit: I am running on Tesseract 3.02