8

I am using python 3.x and using the following code to convert image into text:

from PIL import Image
from pytesseract import image_to_string

image = Image.open('image.png', mode='r')
print(image_to_string(image))

I am getting the following error:

Traceback (most recent call last):
  File "C:/Users/hp/Desktop/GII/Image_to_text.py", line 12, in <module>
    print(image_to_string(image))
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
    config=config)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
    stderr=subprocess.PIPE)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 1220, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Please note that I have put the image in the same directory where my python is present. Also It does not raise error on image = Image.open('image.png', mode='r') but it raises on the line print(image_to_string(image)).

Any idea what might be wrong here? Thanks

muazfaiz
  • 4,611
  • 14
  • 50
  • 88
  • This code works for me, when I have both files in the same directory and the image contains some words. Might be something about absolute and relative paths... – Ohumeronen Jul 21 '16 at 14:54
  • You may also try: import os.path; os.path.exists('image.png') – Ohumeronen Jul 21 '16 at 14:56
  • 1
    I use this code now: `if (os.path.exists('image.png')): image = Image.open('image.png') print(image_to_string(image)) else: print('Does not exist')` but get the same error that means file exist but it is raising error when try to read it for text. – muazfaiz Jul 21 '16 at 14:59

5 Answers5

8

You have to have tesseract installed and accesible in your path.

According to source, pytesseract is merely a wrapper for subprocess.Popen with tesseract binary as a binary to run. It does not perform any kind of OCR itself.

Relevant part of sources:

def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
    '''
    runs the command:
        `tesseract_cmd` `input_filename` `output_filename_base`

    returns the exit status of tesseract, as well as tesseract's stderr output
    '''
    command = [tesseract_cmd, input_filename, output_filename_base]

    if lang is not None:
        command += ['-l', lang]

    if boxes:
        command += ['batch.nochop', 'makebox']

    if config:
        command += shlex.split(config)

    proc = subprocess.Popen(command,
            stderr=subprocess.PIPE)
    return (proc.wait(), proc.stderr.read())

Quoting another part of source:

# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'tesseract'

So quick way of changing tesseract path would be:

import pytesseract
pytesseract.tesseract_cmd = "/absolute/path/to/tesseract"  # this should be done only once 
pytesseract.image_to_string(img)
Łukasz Rogalski
  • 22,092
  • 8
  • 59
  • 93
  • I think you are right but I have installed `tesseract` but it still gives the same error. Infact the brutal part is that when I open the image using `image.show()` method it dies open the image but in the very next line when I process the image it throws FileNotFoundError. I am completely stuck :( – muazfaiz Jul 24 '16 at 23:08
  • `FileNotFoundError` is from lack of `tesseract`, not lack of image file itself. See edit to my answer. – Łukasz Rogalski Jul 25 '16 at 05:09
2

Please install the Below packages for extracting text from images pnf/jpeg

pip install pytesseract

pip install Pillow 

using python pytesseract OCR (Optical Character Recognition) is the process of electronically extracting text from images

PIL is used anything from simply reading and writing image files to scientific image processing, geographical information systems, remote sensing, and more.

from PIL import Image
from pytesseract import image_to_string 
print(image_to_string(Image.open('/home/ABCD/Downloads/imageABC.png'),lang='eng'))
thrinadhn
  • 1,673
  • 22
  • 32
1

You need to download tesseract OCR setup as well. Use this link to download the setup:http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.01.exe

Then, include this line in your code to use tesseract executable: pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract'

This is the default location where tesseract will be installed.

That's it. I have also followed these steps to run the code at my end.

Hope this will help.

0

Your "current" directory is not where you think.

==> You may specify the full path to the image, for example: image = Image.open(r'C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\image.png', mode='r')

stonebig
  • 1,193
  • 1
  • 9
  • 13
0

You can try using this python library: https://github.com/prabhakar267/ocr-convert-image-to-text

As mentioned on the README of the package, usage is very straightforward.

usage: python main.py [-h] input_dir [output_dir]

positional arguments:
  input_dir
  output_dir

optional arguments:
  -h, --help  show this help message and exit