My Problem
Running unicharset_extractor
and receiving :
unicharset_extractor: command not found
My Environment
OSX El Capitan Version 10.11.4
Terminal Version 2.6.1 (361.1)
tesseract 3.04.00
leptonica-1.73
libgif 4.2.3 : libjpeg 9a : libpng 1.6.21 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.5.0 : libopenjp2 2.1.0
Similar Questions
This question has been asked quite a bit, but non seems to solve my problem. Some of the non-working questions are listed here:
unicharset_extractor: command not found
Adding New Fonts to Tesseract 3
Issue 1327 in tesseract-ocr: unicharset_extractor statement doesn´t work
What I'm doing
I am currently trying to train my tesseract-ocr to recognise custom numbers.
What I've done
I originally installed tesseract
using Homebrew, which installed tesseract
, leptonica
, and other dependencies to /usr/local/Cellar
. I used this guide to help me train the data. Like the guide instructed, I first generated a set of .tif
files in the format of tla.test_font.exp[num].tif
. Then I generated the .box
files using this script:
for i in `seq 0 52`;
do
tesseract tla.test_font.exp$i.tif tla.test_font.exp$i -l eng -psm 10 batch.nochop makebox
done
Then I ran the .box
files through tesseract using:
for i in `seq 0 52`;
do
tesseract tla.test_font.exp$i.tif tla.test_font.exp$i -l eng -psm 10 nobatch box.train
done
Then I tried to run unicharset_extractor *.box
and I received the error message above.
Suspecting it has something to do with my tesseract
install path, and not wanting to mess things even more using symlink, I then uninstalled tesseract
, libtool
and leptonica
from usr/local/Cellar
, and used MacPorts to install everything again. This time in /opt/local/bin
. After repeating the same steps mentioned above, I got stuck on the same issue. I even tried running man unicharset_extractor
and the man page worked perfectly.