1

I'm trying to read this number using pytesseract:

image of the number 14

and when I do it prints out IL:

import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'
text = pytesseract.image_to_string(Image.open("Number.jpg"))
print(text)

I've also tried converting the image to black or white:

image of the number 14

but this hasn't worked either. What am I doing wrong?

wovano
  • 4,543
  • 5
  • 22
  • 49

2 Answers2

2

pytesseract works best and gives accurate output with black text on white background. Preprocessing is the main part to get accurate results. But in your case a simple inverse binary thresholding is more than enough to get the correct output as your image does not contain any noise at all. Adaptive thresholding should be used only in case of uneven lighting.

>>> image = cv2.imread("14.jpg",0)
>>> thresh = cv2.threshold(image,0,255,cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
>>> data = pytesseract.image_to_string(thresh,config= '--psm 6 digits')
>>> data
'14'

I think tesseract's version does not cause any problem.

Tesseract version tesseract v5.0.0-alpha.20200223 pytesseract version pytesseract Version: 0.3.4

Tarun Chakitha
  • 406
  • 3
  • 7
1

I think you've missed to set pytesseract' page-segmentation-mode (psm) configuration to 7 which is treating image as a single text line. (source)

I also applied thresholding, my result:

enter image description here

and when I set psm to 7

txt = pytesseract.image_to_string(thr, config="--psm 7 digits")
print(txt)

Result:

14

Code:


import cv2
import pytesseract

img = cv2.imread("d3njD.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 11, 4)
txt = pytesseract.image_to_string(thr, config="--psm 7 digits")
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

Please note that, for other images, this solution may not work. You may need additional image processing methods, or you need to change the parameters.

  • pytesseract version: 4.1.1
Ahmet
  • 7,527
  • 3
  • 23
  • 47
  • Hmm I tried the exact same code as you and I'm now getting "ER" with the same exact code, now I'm wondering if it has something to do with my installation. – Pr1orit3 T1ps Dec 10 '20 at 22:41
  • On the same image or a different image? – Ahmet Dec 10 '20 at 22:49
  • If you are getting `ER` on the same image could you please replace `txt = pytesseract` part in the code with `txt = pytesseract.image_to_string(thr, config="--psm 7 digits")` – Ahmet Dec 10 '20 at 22:51
  • So I tried doing that and this time I got no output (I believe) It printed <0x0c> into the console and that's it do you think it could be my pytesseract instillation? – Pr1orit3 T1ps Dec 10 '20 at 23:58
  • Are you using the same code with mine? Or if you are using a different code, please post it to your question – Ahmet Dec 11 '20 at 00:07
  • And your pytesseract version is 4.1.1 right? – Ahmet Dec 11 '20 at 03:27
  • I did pytesseract.get_tesseract_version() and it said "LooseVersion ('5.0.0-alpha.20200328')" – Pr1orit3 T1ps Dec 11 '20 at 12:41
  • If possible, could you please downgrade to 4.1.1 and see if my answer is working? – Ahmet Dec 11 '20 at 12:53