8

Here's my code:

import pytesseract
import cv2
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files (x86)\Tesseract-OCR\tesseract.exe"


def main():
    original = cv2.imread('D_Testing.png', 0)
    # binary thresh it at value 100. It is now a black and white image
    ret, original = cv2.threshold(original, 100, 255, cv2.THRESH_BINARY)
    text = pytesseract.image_to_string(original, config='--psm 10')
    print(text)
    print(pytesseract.image_to_osd(Image.open('D_Testing.png')))


if __name__ == "__main__":
    main()

For the first out put I get what I need which is the letter D

D

Which is intended, but when it tries to do the second print statement it spits out this.

Traceback (most recent call last):
  File "C:/Users/Me/Documents/Python/OpenCV/OpenCV_WokringTest/PytesseractAttempt.py", line 18, in <module>
    main()
  File "C:/Users/Me/Documents/Python/OpenCV/OpenCV_WokringTest/PytesseractAttempt.py", line 14, in main
    print(pytesseract.image_to_osd(Image.open('D_Testing.png')))
  File "C:\Users\Me\Documents\Python\OpenCV\OpenCV_WokringTest\venv\lib\site-packages\pytesseract\pytesseract.py", line 402, in image_to_osd
    }[output_type]()
  File "C:\Users\Me\Documents\Python\OpenCV\OpenCV_WokringTest\venv\lib\site-packages\pytesseract\pytesseract.py", line 401, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "C:\Users\Me\Documents\Python\OpenCV\OpenCV_WokringTest\venv\lib\site-packages\pytesseract\pytesseract.py", line 218, in run_and_get_output
    run_tesseract(**kwargs)
  File "C:\Users\Me\Documents\Python\OpenCV\OpenCV_WokringTest\venv\lib\site-packages\pytesseract\pytesseract.py", line 194, in run_tesseract
    raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v4.0.0.20181030 with Leptonica Warning: Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.'). 

I am not sure what to do. I can't really find too much about this error online. Also I am not sure what to do. The goal is simply to get it spit out the orientation of my letter. Thank you for all helpful comment in advance!

alyssaeliyah
  • 2,214
  • 6
  • 33
  • 80
Bob Stoops
  • 151
  • 2
  • 12
  • And what if you pass your preprocessed `original` to `image_to_osd`? – Dmitrii Z. Jan 05 '19 at 10:22
  • [This is the image I am passing through](https://i.stack.imgur.com/R3nzH.png) – Bob Stoops Jan 05 '19 at 23:51
  • Okay, and what if instead of unprocessed image (`Image.open('D_Testing.png'))`) you pass preprocessed one (`original `) to `image_to_osd`? – Dmitrii Z. Jan 06 '19 at 09:16
  • Still doesn't work unfortunately – Bob Stoops Jan 07 '19 at 04:39
  • It is because it cannot extract the dpi information from your image. https://github.com/tesseract-ocr/tesseract/issues/1702 – alyssaeliyah Jun 19 '19 at 08:48
  • 1
    Ran into a similar issue and resolved it by passing `--dpi` to `config` in the pytesseract function. `image = Image.open(path) config_str = '--dpi ' + str(image.info['dpi'][0]) text = pytesseract.image_to_string(image , config=config_str) ` – mbauer Jan 23 '20 at 02:58

3 Answers3

6

Faced this problem and tried different approaches and finally solved it!!

Just pass the image location directly rather than through Pillow or OpenCV as mentioned by @Esraa Abdelmaksoud

text = pytesseract.image_to_osd(r'Report 2.jpeg')
  • I'm running into this same scenario. If I open the image via PIL, pytesseract will throw errors being unable to detect DPI, format, and then reporting too few characters. But if the image is passed as a path it works right away. In my case this is problematic because I'm enhancing the images first so I'd rather not have to save it just to reopen it :\ Anyone else has an explanation for this? – CBallenar Jun 22 '22 at 09:41
3

Tesseract OSD works by using the characters recognized in the image to detect the orientation and rotation. There is a minimum number for characters to make it work called min_characters_to_try. If the engine can't find enough characters or the engine can't recognize such font, the OSD will give that error message. There are also other cases that cause failure such as rotations that aren't near 90,180, or 270 degrees.

Also, pass your cropped image file directly like this and don't use the output of OpenCV or Pillow.

osd = pytesseract.image_to_osd(r'D:\image.jpg',config='--psm 0 -c min_characters_to_try=5')

In your case, one character is not enough and it's not clear for the engine to read it.

Esraa Abdelmaksoud
  • 1,307
  • 12
  • 25
0

One character is too little for the OSD feature to reliably detect script and orientation. There is the parameter min_characters_to_try which governs the cutoff. By default it's 50. So your image should contain at least 50 character for OSD to properly work

> $ tesseract --print-parameters | fgrep characters ...
> min_characters_to_try 50  Specify minimum characters to try during OSD
Dilshat
  • 1,088
  • 10
  • 12