0

i try to scrape text from an image using pytesseract with the following code:

import pytesseract
from PIL import Image

path_to_tesseract = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
imagePath = r"C:\Users\Polzi\Documents\DEV\Fiverr\TRY\johngreen683\pic2.jpeg"
r"C:\Users\Polzi\Documents\DEV\Fiverr\TRY\johngreen683\d.jpg"
pytesseract.tesseract_cmd = path_to_tesseract
img = Image.open(imagePath)
text = pytesseract.image_to_string(img)
print(text)

I want to scrape the text form the following picture: enter image description here

But unfortunately the output of the scraping is always empty.

Is there any way to get the text scraped form such a picture?

Rapid1898
  • 895
  • 1
  • 10
  • 32
  • have you tried to do the same with other (easier) images as well? there can be something wrong with your installation (just maybe) – Bedir Yilmaz Jan 19 '22 at 14:51
  • maybe tesseract OCR is not what you are looking for since [it does not work well with colorful, blurry or noisy images](https://bhadreshpsavani.medium.com/how-to-use-tesseract-library-for-ocr-in-google-colab-notebook-5da5470e4fe0). FWIW, I have reproduced your situation in [this colab notebook](https://colab.research.google.com/drive/12_p4SobCnWNHZy5HQeYjR0bQSdsCFQWv?usp=sharing). good luck – Bedir Yilmaz Jan 19 '22 at 15:06
  • Thanks for your response - and yes the code works fine with some not so colourful pictures. What have you done with the picture that the scraping is afterwards possible? – Rapid1898 Jan 19 '22 at 20:24
  • Youre welcome. I have tried to crop the picture to include the ROI only. And I have converted it to B&W. But don't get the wrong idea; my attempts did not succeed either. If I had invested more time on this, I would try to rotate the image. Because apparently [the engine does not recognize rotated text](https://stackoverflow.com/q/55119504/776348) very well. – Bedir Yilmaz Jan 20 '22 at 07:33
  • Does this answer your question? [image processing to improve tesseract OCR accuracy](https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy) – Bedir Yilmaz Jan 20 '22 at 08:27
  • [I have made some other enhancements](https://colab.research.google.com/drive/12_p4SobCnWNHZy5HQeYjR0bQSdsCFQWv?usp=sharing) to the image but no luck. You need advice from image processing experts. Try the answer I shared above. – Bedir Yilmaz Jan 20 '22 at 08:28
  • 1
    thanks for your help and investigations - i think there is no reasonable solution to automate this for a big numbers of pictures like that – Rapid1898 Jan 21 '22 at 10:53

0 Answers0