5

I'm trying to add new fonts to tesseract ocr. I'm following this tutorial but I'm having some problems.

Here's what I've done so far:

  1. Create training document

    convert eng.myfont.exp0.pdf eng.myfont.exp0.tif

  2. Train Tesseract

    tesseract eng.myfont.exp0.tif eng.myfont.exp0 batch.nochop makebox

    This created my eng.myfont.exp0.box file.

    I open the file with moshpytt and make sure it was detected correctly.

  3. Feed the box file back into tesseract

    tesseract eng.myfont.exp0.tif eng.myfont.exp0.box nobatch box.train.stderr

    I have this result:

    Tesseract Open Source OCR Engine v3.03 with Leptonica
    APPLY_BOXES:
    Boxes read from boxfile: 146
    Found 146 good blobs.
    TRAINING ... Font name = myfont.exp0
    Generated training data for 6 words

    • eng.myfont.exp0.box.tr file and eng.myfont.exp0.box.txt generated
  4. try to detect the Character set used in the box file (this is where I get stuck)

    unicharset_extractor *.box

Result:

unicharset_extractor: command not found

I also tred unicharset_extractor eng.myfont.exp0.box with the same result.

I'm using:

  • tesseract 3.03
  • leptonica-1.70
  • libgif 4.1.6(?) : libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : webp 0.4.0
  • Ubuntu 14.04.1 LTS
jam
  • 3,640
  • 5
  • 34
  • 50
  • That's pretty peculiar. It just means the command cannot be found. On my system I'm able to find that command without any issue in `/usr/local/bin/unicharset_extractor`. – mlissner Oct 06 '14 at 07:24

2 Answers2

5

The training tools for Tesseract 3.03 RC were omitted from Ubuntu 14.04. So either fall back to Tesseract 3.02 or upgrade to Ubuntu 14.10, which should have it.

nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • I have also got same issue in OS X EI Captain 10.11.1 terminal. I have using below versions tesseract 3.04.01 leptonica-1.72 libjpeg 8d : libpng 1.6.21 : libtiff 4.0.6 : zlib 1.2.5 – prabakaran iOS Mar 30 '16 at 09:10
3

Ok, I googled this for you. Here's the answer:

You need to run all commands in the same folder where are located your input files.

From:

mlissner
  • 17,359
  • 18
  • 106
  • 169