A minimal way to install 'tesseract' for c++ development in linux?

Question

According to this answer I would have to checkout entire repo of tesseract.

How do I install minimal packages of tesseract with c++ APIs for development and English language detection in linux (ubuntu)?

Update - Reason for using the large SVN repo is to enable g++ compilation.

When installed with apt-get:

bhp@Virtual-Machine:~/Desktop/bhp/opencv-tesseract$ pkg-config --cflags --libs tesseract
Package tesseract was not found in the pkg-config search path.
Perhaps you should add the directory containing `tesseract.pc'
to the PKG_CONFIG_PATH environment variable
No package 'tesseract' found

When built with source:

bhp@Virtual-Machine:~/Desktop/soft/tesseract-ocr$ pkg-config --cflags --libs tesseract
-I/usr/local/include/tesseract  -L/usr/local/lib -ltesseract

You'll probably find things ready to install on Ubuntu using `apt-cache search tesseract`. — molbdnilo, Aug 07 '14 at 04:23
If you run `svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr`, you end up with about 1.3 GiB on disk, of which half is the `.svn` directory (660 MiB). Of the remainder, 630+ MiB is the `tessdata` training data. That means the other code is minuscule by comparison. If you're really short of disk space, then maybe Tesseract OCR is not for you. — Jonathan Leffler, Aug 07 '14 at 04:23
@molbdnilo yes I did `apt-get install tesseract-ocr tesseract-ocr-eng libtesseract-dev libleptonica-dev` but thats not providing headers and apis for c++ program. — Tom Iv, Aug 07 '14 at 04:34
@JonathanLeffler `tessdata` only for English is not sufficient? — Tom Iv, Aug 07 '14 at 04:36
There isn't a separate 'English' training data set AFAICS. There is only one subdirectory in `tessdata` and that's called `configs`. OCR is somewhat independent of language, though it does depend on the code set, of course. That is, recognizing Arabic or Chinese symbols is different from recognizing English symbols, but English symbols are also used by a great many other languages. — Jonathan Leffler, Aug 07 '14 at 04:40
@JonathanLeffler I wasn't clear. I meant we could skip the `tessdata` folder? — Tom Iv, Aug 07 '14 at 04:49
Possibly, if your SVN is good enough. I've not built Tesseract OCR, and neither have I used it. I just downloaded it (using the `svn` command cited), and I think it isn't big enough that I'd worry about the disk space. That may not meet your definition of minimal, which is one reason this is commentary and not an answer. — Jonathan Leffler, Aug 07 '14 at 04:54
I just don't mind doing a full checkout. The reason I ask is , I'm building it on a virtual machine. Hence I would want to skip things that are not necessary right now. — Tom Iv, Aug 07 '14 at 05:02
@bhp Something must have gone wrong in your installation - my headers are in `/usr/include/tesseract`. — molbdnilo, Aug 07 '14 at 05:09
@molbdnilo when tessseract is installed via apt-get the following output is obtained - `pkg-config --cflags --libs tesseractbhp@bhp-Virtual-Machine:~/Desktop/bhp/opencv-tesseract$ pkg-config --cflags --libs tesseract Package tesseract was not found in the pkg-config search path. Perhaps you should add the directory containing 'tesseract.pc' to the PKG_CONFIG_PATH environment variable No package 'tesseract' found`. Only by following build instructions g++ is able to pick tesseract while compiling — Tom Iv, Aug 07 '14 at 07:13

A minimal way to install 'tesseract' for c++ development in linux?

0 Answers0