Questions tagged [tesseract]

Tesseract is an OCR (Optical Character Recognition) engine originally developed at HP Labs and now available as an open source library with development sponsored by Google.

Tesseract is an open source, multi-lingual OCR (Optical Character Recognition) engine originally developed at HP Labs. It is now sponsored by Google and licensed under the Apache license 2.0. It currently recognizes 107 languages. Tesseract is primarily written in C++ and C. The project is hosted at https://github.com/tesseract-ocr/tesseract and its support forums are found at http://groups.google.com/group/tesseract-ocr.

4350 questions
198
votes
14 answers

image processing to improve tesseract OCR accuracy

I've been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I'm looking for tips on what sort of image processing might improve the results. I've noticed that text that is highly pixellated - for…
user364902
  • 3,146
  • 6
  • 23
  • 23
171
votes
32 answers

Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this?

I'm trying to run a basic and very simple code in python. from PIL import Image import pytesseract im = Image.open("sample1.jpg") text = pytesseract.image_to_string(im, lang = 'eng') print(text) This is what it looks like, I have actually…
Jed Bartlet
  • 1,963
  • 2
  • 11
  • 12
126
votes
20 answers

Tesseract running error

I have a problem with running tesseract-ocr engine on linux. I've downloaded RUS language data and put it to tessdata directory (/usr/local/share/tessdata). When I'm trying to run tesseract with command tesseract blob.jpg out -l rus , it displays an…
Russel Crowe
  • 1,271
  • 2
  • 8
  • 3
100
votes
4 answers

How do I choose between Tesseract and OpenCV?

I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service. I tried using Tesseract on some of my images and its accuracy seems…
Legend
  • 113,822
  • 119
  • 272
  • 400
92
votes
26 answers

How do I resolve a TesseractNotFoundError?

I am trying to use pytesseract in Python but I always end up with the following error: raise TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path However, pytesseract and…
PreetyP
  • 931
  • 1
  • 7
  • 4
87
votes
7 answers

Limit characters tesseract is looking for

Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.
Danilo Bargen
  • 18,626
  • 15
  • 91
  • 127
71
votes
10 answers

How to make tesseract to recognize only numbers, when they are mixed with letters?

I want to use tesseract to recognize only numbers. The problem is that I have mixture of numbers & letters and when I use SetVariable("tessedit_char_whitelist", "0123456789") for every symbol tesseract returns wrong digit. Can I set a threshold…
zkunov
  • 3,362
  • 1
  • 20
  • 17
71
votes
1 answer

best OCR (Optical character recognition) example in android

I want a running example of OCR in android, I have done some research and find an example that implements OCR in android. https://github.com/rmtheis/tess-two and in it there are three projects files... eyes-two tess-two tess-two-test I have…
Komal
  • 739
  • 1
  • 6
  • 6
69
votes
4 answers

Pytesseract OCR multiple config options

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often confused with an 'O'. Like this: target =…
Niall Oswald
  • 805
  • 1
  • 8
  • 7
61
votes
8 answers

Getting the bounding box of the recognized words using python-tesseract

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code. I am using the following code for getting the words: import tesseract api =…
47
votes
1 answer

Using Tesseract for handwriting recognition

I was just wondering how accurate can tesseract be for handwriting recognition if used with capital letters all in their own little boxes in a form. I know you can train it to recognise your own handwriting somewhat but the problem in my case is I…
Jackdaw
  • 663
  • 1
  • 6
  • 12
46
votes
2 answers

Set Tesseract font for OCR

I would like to use tesseract for serial number recognition, where I only want to recognize single characters, no word, no dictionary. Therefore I would like to use one of the already trained tesseract font-types for the serial number to achieve…
Mr.Sheep
  • 1,368
  • 1
  • 15
  • 32
41
votes
4 answers

Tesseract and tiff format - spp not in set {1,3}

While trying to run this command: tesseract bond111.tif bond111 batch.nochop makebox I get the next error Error in pixReadFromTiffStream: spp not in set {1,3} Error in pixReadStreamTiff: pix not read Error in pixReadTiff: pix not read Assuming…
Asaf
  • 8,106
  • 19
  • 66
  • 116
41
votes
7 answers

Extracting code from photograph of T-shirt via OCR

I recently saw someone with a T-shirt with some Perl code on the back. I took a photograph of it and cropped out the code: Next I tried to extract the code from the image via OCR, so I installed Tesseract OCR and the Python bindings for it,…
BioGeek
  • 21,897
  • 23
  • 83
  • 145
40
votes
5 answers

Where are the Tesseract API docs?

I've looked all over the Google code site but am just not finding anything that explains how to use Tesseract from an API perspective. Anyone know where I can find this?
xanadont
  • 7,493
  • 6
  • 36
  • 49
1
2 3
99 100