how to extract the data from image using python

Question

Hi I am a newbie to ocr models, I have 2000 Receipt jpeg images, I am trying to extract the data from an image, it shows some error, Please how to do. It tried this :

from PIL import Image
import glob
image_list = []
for filename in glob.glob('E:/Receipts/Receipts/*.jpeg'): 
    im=Image.open(filename)
    image_list.append(im)
pytesseract.pytesseract.tesseract_cmd="C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"

text=pytesseract.image_to_string(image_list,lang = 'eng')

print(text)

Error it showing :

TypeError                                 Traceback (most recent call last)
<ipython-input-49-fb43962ff6d0> in <module>()
      6     image_list.append(im)
      7 
----> 8 text=pytesseract.image_to_string(image_list,lang = 'eng')
      9 
     10 print(text)





~\Anaconda3\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image, extension, lang, config, nice, timeout, return_bytes)
    240     temp_name, input_filename = '', ''
    241     try:
--> 242         temp_name, input_filename = save_image(image)
    243         kwargs = {
    244             'input_filename': input_filename,

~\Anaconda3\lib\site-packages\pytesseract\pytesseract.py in save_image(image)
    169         return temp_name, realpath(normpath(normcase(image)))
    170 
--> 171     image, extension = prepare(image)
    172     input_file_name = temp_name + os.extsep + extension
    173     image.save(input_file_name, format=extension, **image.info)

~\Anaconda3\lib\site-packages\pytesseract\pytesseract.py in prepare(image)
    143 
    144     if not isinstance(image, Image.Image):
--> 145         raise TypeError('Unsupported image object')
    146 
    147     extension = 'PNG' if not image.format else image.format

TypeError: Unsupported image object

ndrplz · Answer 1 · 2019-11-17T10:22:34.503

0

If you look at the stacktrace, you see that pytesseract complains about the data type you are feeding it with.

Indeed with your call text=pytesseract.image_to_string(image_list,lang = 'eng') you are trying to feed pytesseract.image_to_string with a list of images, which is not possible.

If you want to get a list of texts, one for each image, iterate on the images:

texts = [pytesseract.image_to_string(img,lang = 'eng') for img in image_list]
for text in texts:
    print(text)

how to extract the data from image using python

Error it showing :

1 Answers1