1

I'm fairly new on OpenCv and tesseract. I'm recently building a project on using computer vision to detect door labels. Hopefully it would be beneficial for visually impaired group.

The idea of the program is to preprocess the input image by converting it into binary color, then use canny edge to detect the outlines of door label, then dilate the canny edge result. After these, feed image to tesseract while trying to show the text detected with boxes.

Expected results are green rectangles on text. While printing out the text itself.

The issue is the missing rectangles and failure in text detection.

I have tried going through these:

  1. Recognize Text in images using Canny Edge detection in Opencv
  2. OpenCv pytesseract for OCR
  3. Image preprocessing with OpenCV before doing character recognition (tesseract)

The questions and solutions are either too simple or not as relevant. Some are not in python as well.

Attached below is my attempt on the code:

import pytesseract as pytess
import cv2 as cv
import numpy as np
from PIL import Image
from pytesseract import Output

img = cv.imread(r"C:\Users\User\Desktop\dataset\p\Image_31.jpg", 0)

# edges store the canny version of img
edges = cv.Canny(img, 100, 200)

# ker as in kernel
# (5, 5) is the matrix while uint8 is datatype
ker = np.ones((3, 3), np.uint8)

# dil as in dilation
# edges as the src, ker is the kernel we set above, number of dilation
dil = cv.dilate(edges, ker, iterations=1)

# setup pytesseract parameters
configs = r'--oem 3 --psm 6'

# feed image to tesseract
result = pytess.image_to_data(dil, output_type=Output.DICT, config=configs, lang='eng')
print(result.keys())

boxes = len(result['text'])

# make a new copy of edges
new_item = dil.copy()

for sequence_number in range(boxes):
    if int(result['conf'][sequence_number]) > 30: # removed constraints
        (x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
                        result['width'][sequence_number], result['height'][sequence_number])
        new_item = cv.rectangle(new_item, (x, y), (x + w, y + h), (0, 255, 0), 2)

# detect sentence with tesseract
# pending as rectangle not achieved

cv.imshow("original", img)
cv.imshow("canny", edges)
cv.imshow("dilation", dil)
cv.imshow("capturedText", new_item)

#ignore below this line, it is only for testing

#testobj = Image.fromarray(dil)
#testtext = pytess.image_to_string(testobj, lang='eng')
#print(testtext)


cv.waitKey(0)
cv.destroyAllWindows()

Resultant image: enter image description here

The testing part of the code return results as shown below:

a)
Meets

Which, obviously does not satisfy the objective.

EDIT

After posting the question, I realized I may have done it wrong in the beginning. I should attempt to use OpencV to detect the contour of the door label and isolate the part containing text before sending whatever is in the rectangle for OCR recognition.

EDIT2

Now that I identify the issue thanks to our stackoverflow members, now I'm attempting to add on image rectification/image wrapping technique to retrieve a straight front view to get a better accuracy for the system. Update soon.

EDIT3

After certain bug fixing, reducing the constraint while allowing the function to draw on the original image, I have achieved the results below. Attached the updated code as well.

enter image description here

import cv2 as cv
import numpy as np
import pytesseract as pytess
from pytesseract import Output

# input of img source
img = cv.imread(r"C:\Users\User\Desktop\dataset\p\Image_31.jpg")

# necessary image color conversion
img2 = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

# edges store the canny version of img
edges = cv.Canny(img2, 100, 200)

# ker as in kernel
# (5, 5) is the matrix while uint8 is datatype
ker = np.ones((3, 3), np.uint8)

# dil as in dilation
# edges as the src, ker is the kernel we set above, number of dilation
dil = cv.dilate(edges, ker, iterations=1)

# setup pytesseract parameters
configs = r'--oem 3 --psm 6'

# feed image to tesseract
result = pytess.image_to_data(dil, output_type=Output.DICT, config=configs, lang='eng')

# number of boxes that encapsulate the boxes
boxes = len(result['text'])

# make a new copy of edges
new_item = dil.copy()

for sequence_number in range(boxes):
    if int(result['conf'][sequence_number]) > 0: #removed constraints
        (x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
                        result['width'][sequence_number], result['height'][sequence_number])

        # draw rectangle boxes on the original img
        cv.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3)
        # Crop the image
        crp = new_item[y:y + h, x:x + w]
        # OCR
        txt = pytess.image_to_string(crp, config=configs)
        # returns recognised text
        print(txt)
        cv.imshow("capturedText", crp)
        cv.waitKey(0)

# cv.imshow("original", img)
# cv.imshow("canny", edges)
# cv.imshow("dilation", dil)
cv.imshow("results", img)
cv.waitKey(0)
cv.destroyAllWindows()
seraph
  • 320
  • 2
  • 15

2 Answers2

2

You have found all the detected text in the image:

for sequence_number in range(boxes):
    if int(result['conf'][sequence_number]) > 30:
        (x, y, w, h) = (result['left'][sequence_number], result['top'][sequence_number],
                        result['width'][sequence_number], result['height'][sequence_number])
        new_item = cv.rectangle(new_item, (x, y), (x + w, y + h), (0, 255, 0), 2)

But you also say the current confidence should be more than 70%.

  • If we remove the constraint
  • If we OCR each new item

Result will be:

enter image description here

Now if you read:

txt = pytesseract.image_to_string(new_item, config="--psm 6")
print(txt)

OCR will be:

Meeting Room §

The output of the current pytesseract version 0.3.7

Code:

# Load the libraries
import cv2
import pytesseract

# Load the image
img = cv2.imread("fsUSw.png")

# Convert it to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# OCR detection
d = pytesseract.image_to_data(gry, config="--psm 6", output_type=pytesseract.Output.DICT)

# Get ROI part from the detection
n_boxes = len(d['level'])

# For each detected part
for i in range(1, 2):
    # Get the localized region
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])

    # Draw rectangle to the detected region
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 5)

    # Crop the image
    crp = gry[y:y + h, x:x + w]
    
    # OCR
    txt = pytesseract.image_to_string(crp, config="--psm 6")
    print(txt)

    # Display the cropped image
    cv2.imshow("crp", crp)
    cv2.waitKey(0)

# Display
cv2.imshow("img", img)
cv2.waitKey(0)

Ahmet
  • 7,527
  • 3
  • 23
  • 47
  • Thank you for the feedback! I've gotten inspiration from you and @gilles, and now im attempting to do some image warping to get the correct angle, hopefully it'll help improve accuracy. – seraph Mar 14 '21 at 02:40
  • 1
    Glad if I could help – Ahmet Mar 14 '21 at 08:15
  • Hey sorry to bother, do you know why the numbers at the top right of the label is omitted? – seraph Mar 20 '21 at 03:10
  • 1
    If you mean 1-118 integers, the reason is python-tesseract is not rotation-invariant. If the image is slightly rotated, thats ok. However, if rotation degree passes the threshold, then python-tesseract can't recognize the text – Ahmet Mar 20 '21 at 06:13
  • ahhh understand now, thanks for the clarification – seraph Mar 20 '21 at 06:15
1

I think what you are looking for here is image rectificaiton (warping image to make it look like taken from another point of view) and there seem to be tools for this in python. However, the problem gets more complicated since in your case you need to detect how you want to rectify it. I am not sure how you should go about that.

  • Ah that reminds me of something like document scanner as well. I do think I came across something like this in OpenCV documentation. – seraph Mar 11 '21 at 23:42