2

My Problem

Running unicharset_extractor and receiving :

unicharset_extractor: command not found

My Environment

OSX El Capitan Version 10.11.4

Terminal Version 2.6.1 (361.1)

tesseract 3.04.00
 leptonica-1.73
  libgif 4.2.3 : libjpeg 9a : libpng 1.6.21 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.5.0 : libopenjp2 2.1.0

Similar Questions

This question has been asked quite a bit, but non seems to solve my problem. Some of the non-working questions are listed here:

unicharset_extractor: command not found

Adding New Fonts to Tesseract 3

Issue 1327 in tesseract-ocr: unicharset_extractor statement doesn´t work

What I'm doing

I am currently trying to train my tesseract-ocr to recognise custom numbers.

What I've done

I originally installed tesseract using Homebrew, which installed tesseract, leptonica, and other dependencies to /usr/local/Cellar. I used this guide to help me train the data. Like the guide instructed, I first generated a set of .tif files in the format of tla.test_font.exp[num].tif. Then I generated the .box files using this script:

for i in `seq 0 52`;
do
    tesseract tla.test_font.exp$i.tif tla.test_font.exp$i -l eng -psm 10 batch.nochop makebox
done

Then I ran the .box files through tesseract using:

for i in `seq 0 52`;
do
    tesseract tla.test_font.exp$i.tif tla.test_font.exp$i -l eng -psm 10 nobatch box.train
done

Then I tried to run unicharset_extractor *.box and I received the error message above.

Suspecting it has something to do with my tesseract install path, and not wanting to mess things even more using symlink, I then uninstalled tesseract, libtool and leptonica from usr/local/Cellar, and used MacPorts to install everything again. This time in /opt/local/bin. After repeating the same steps mentioned above, I got stuck on the same issue. I even tried running man unicharset_extractor and the man page worked perfectly.

Community
  • 1
  • 1
SegFault
  • 2,526
  • 4
  • 21
  • 41

2 Answers2

2

While installing tesseract you haven't installed the training tools. Now you will have to uninstall tesseract using the following command brew uninstall tesseract and then install the tools using brew install --with-training-tools tesseract

After this you should be able to run unicharset_extractor command.

Thanks

  • Thanks for answering this after so long. I no longer have the correct environment to verify your answer. If someone else can confirm that your answer is correct, I will accept it. – SegFault Mar 07 '18 at 23:28
0

Try the commands and steps as specified in https://tesseract-ocr.github.io/tessdoc/Compiling.html. Following those steps resolved my issue.

Tanu Arora
  • 231
  • 2
  • 7