Questions tagged [python-tesseract]

Python-tesseract is a wrapper class for Tesseract OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and get its text, data of text, or even convert it to pdf.

Python-tesseract is a wrapper class for OCR that allows any conventional image files (JPG, GIF, PNG, TIFF, etc.) to be read and decoded into usable text.

Tesseract is advertised as the most accurate open source OCR engine available. It was developed at HP Labs between 1985 and 1995 and then remained dormant until 2006 when Google revived the project.

For more information, please see the Python-tesseract page or the Tesseract page.

1664 questions
92
votes
26 answers

How do I resolve a TesseractNotFoundError?

I am trying to use pytesseract in Python but I always end up with the following error: raise TesseractNotFoundError() pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path However, pytesseract and…
PreetyP
  • 931
  • 1
  • 7
  • 4
61
votes
8 answers

Getting the bounding box of the recognized words using python-tesseract

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code. I am using the following code for getting the words: import tesseract api =…
37
votes
4 answers

Detect text region in image using Opencv

I have an image and want to detect the text regions in it. I tried TiRG_RAW_20110219 project but the results are not satisfactory. If the input image is https://i.stack.imgur.com/ILTvo.jpg it is producing https://i.stack.imgur.com/ILTvo.jpg#1 as…
Meenal Goyal
  • 387
  • 1
  • 4
  • 5
23
votes
2 answers

how to avoid Permission denied while installing package for Python without sudo

I am trying to install the tesseract wrapper for python as user mike so that I can import tesseract. I'm following the guide here https://code.google.com/p/python-tesseract/wiki/HowToCompilePythonTesseractForCentos However, when I execute python…
Anthony
  • 33,838
  • 42
  • 169
  • 278
21
votes
1 answer

Pytesseract.TesseractError 'Usage: python pytesseract.py [-l lang] input_file

I am getting the following error when trying to print a simple test image to text. I've verified that I have Pillow (PIL 1.1.7) and tried uninstalling and reinstalling pytesseract. The file paths are correct because if I change them I get another…
Blair
  • 213
  • 1
  • 2
  • 7
19
votes
6 answers

WinError 5:Access denied PyTesseract

I know this question has already been answered on this site, however, none of the solutions I looke up the internet seemed to work. Here's what I tried: Giving all permissions to my python file Changing PATH variable to point to my tesseract…
Oussama Boussif
  • 811
  • 2
  • 8
  • 13
18
votes
1 answer

pytesseract cannot find the file specified

My code is straight forward and is the following: import pytesseract from PIL import Image img = Image.open('C:/temp/foo.jpg') img.load() i = pytesseract.image_to_string(img) and the error response I get back is: Traceback (most recent call…
jason m
  • 6,519
  • 20
  • 69
  • 122
17
votes
12 answers

(-215:Assertion failed) !_src.empty() in function 'cv::cvtColor' with cv::imread

I am trying to recognize text from an image to then have the text outputted; however, this error spits out: Traceback (most recent call last): File "C:/Users/Benji's Beast/AppData/Local/Programs/Python/Python37-32/imageDet.py", line 41, in…
Benji
  • 197
  • 1
  • 1
  • 5
17
votes
1 answer

Pytesseract set character whitelist

Does anyone know how to set the character whitelist for Pytesseract? I want it to only output A-z and 0-9. Is this possible? I have the following: img = Image.open('test.jpg') result = pytesseract.image_to_string(img, config='-psm 6') I'm getting…
Minato10
  • 173
  • 1
  • 1
  • 4
14
votes
7 answers

OSError: [Errno 2] No such file or directory using pytesser

This is my problem, I want to use pytesser to get a picture's contents. My operating system is Mac OS 10.11, and I have already installed PIL, pytesser, tesseract-ocr engine, and other supporting libraries like libpng and so on. But when I run my…
grant
  • 141
  • 1
  • 1
  • 4
13
votes
3 answers

Tesseract installation in windows

I am currently working on optimal character recognition project using python 2.7,open computer vision in windows.To accomplish this task i came to know that it can be done by using tesseract (software).But, it cannot be installed on windows. I…
zeeshan
  • 131
  • 1
  • 1
  • 5
12
votes
1 answer

Why does pytesseract fail to recognise digits from image with darker background?

I've this python code which I use to convert a text written in a picture to a string, it does work for certain images which have large characters, but not for the one I'm trying right now which contains only digits. This is the picture: This is my…
alioua walid
  • 247
  • 3
  • 19
12
votes
1 answer

Equivalents to OpenCV's erode and dilate in PIL?

I want to do some image OCR with PyTesseract, and I've seen that OpenCV's erode and dilate functions are very useful for noise removal pre-processing. Since PyTesseract already requires PIL/Pillow, I'd like to do the noise removal in PIL, rather…
ROldford
  • 310
  • 1
  • 3
  • 12
12
votes
4 answers

how to get character position in pytesseract

I am trying to get character position of image files using pytesseract library . import pytesseract from PIL import Image print pytesseract.image_to_string(Image.open('5.png')) Is there any library for getting each position of character
11
votes
3 answers

How to improve Hindi text extraction?

I am trying to extract Hindi text from a PDF. I tried all the methods to exract from the PDF, but none of them worked. There are explanations why it doesn't work, but no answers as such. So, I decided to convert the PDF to an image, and then use…
Abhishek Rai
  • 2,159
  • 3
  • 18
  • 38
1
2 3
99 100