How improve image quality to extract text from image using Tesseract

Question

I'm trying to use Tessract in the code below to extract the two lines of the image. I tryied to improve the image quality but even though it didn't work.

Can anyone help me?

from PIL import Image, ImageEnhance, ImageFilter
import pytesseract

img = Image.open(r'C:\ocr\test00.jpg')
new_size = tuple(4*x for x in img.size)
img = img.resize(new_size, Image.ANTIALIAS)
img.save(r'C:\\test02.jpg', 'JPEG')


print( pytesseract.image_to_string( img ) )

Yes. I tryied also to put it in black&white with high contrast. — Leonardo Wolter, Feb 03 '19 at 16:31
I seem to remember reading somewhere that Tesserarct doesn’t like dot-matrix text - can you process it in e.g. opencv so the characters are more like continuous strokes? — DisappointedByUnaccountableMod, Feb 03 '19 at 17:56

J.D. · Answer 1 · 2019-02-08T19:17:34.860

Given the comment by @barny I don't know if this will work, but you can try the code below. I created a script that selects the display area and warps this into a straight image. Next a threshold to a black and white mask of the characters and the result is cleaned up a bit.

Try if it improves recognition. If it does, also look at the intermediate stages so you'll understand all that happens.

Update: It seems Tesseract prefers black text on white background, inverted and dilated the result.

Result:

Updated result:

Code:

import numpy as np 
import cv2
# load image
image = cv2.imread('disp.jpg')

# create grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# perform threshold
retr, mask = cv2.threshold(gray_image, 190, 255, cv2.THRESH_BINARY)

# findcontours
ret, contours, hier = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# select the largest contour
largest_area = 0
for cnt in contours:
    if cv2.contourArea(cnt) > largest_area:
        cont = cnt
        largest_area = cv2.contourArea(cnt)

# find the rectangle (and the cornerpoints of that rectangle) that surrounds the contours / photo
rect = cv2.minAreaRect(cont)
box = cv2.boxPoints(rect)
box = np.int0(box)

#### Warp image to square
# assign cornerpoints of the region of interest
pts1 = np.float32([box[2],box[3],box[1],box[0]])
# provide new coordinates of cornerpoints
pts2 = np.float32([[0,0],[500,0],[0,110],[500,110]])

# determine and apply transformationmatrix
M = cv2.getPerspectiveTransform(pts1,pts2)
tmp = cv2.warpPerspective(image,M,(500,110))

 # create grayscale
gray_image2 = cv2.cvtColor(tmp, cv2.COLOR_BGR2GRAY)
# perform threshold
retr, mask2 = cv2.threshold(gray_image2, 160, 255, cv2.THRESH_BINARY_INV)

# remove noise / close gaps
kernel =  np.ones((5,5),np.uint8)
result = cv2.morphologyEx(mask2, cv2.MORPH_CLOSE, kernel)

#draw rectangle on original image
cv2.drawContours(image, [box], 0, (255,0,0), 2)

# dilate result to make characters more solid
kernel2 =  np.ones((3,3),np.uint8)
result = cv2.dilate(result,kernel2,iterations = 1)

#invert to get black text on white background
result = cv2.bitwise_not(result)

#show image
cv2.imshow("Result", result)
cv2.imshow("Image", image)

cv2.waitKey(0)
cv2.destroyAllWindows()

wow! Thank you for your attention. Even if it does not work, I'll learn a lot with you solution. Ill try it in the next few days and return here to inform — Leonardo Wolter, Feb 08 '19 at 18:15
JD, I couldn't wait to test it. It didn't work at the first moment but I'm sure I can improve to get the solution starting from where you stopped. Really interisting how you processed the image and I'm learning a lot with it. Really thankful about your code. — Leonardo Wolter, Feb 08 '19 at 18:32
I just read in a couple of placess [example](https://stackoverflow.com/questions/54573130/how-to-change-a-part-of-the-color-of-the-background-which-is-black-to-white) that Tesseract doest do that good with white text on black background. So I added a line that inverts the result. Could you try again? — J.D., Feb 08 '19 at 19:01
Just had another thought. I also added a [dilation](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html#dilation) step that makes the characters more solid. I think this will help a lot. I'm curious if it works. — J.D., Feb 08 '19 at 19:20

How improve image quality to extract text from image using Tesseract

1 Answers1

Linked