10

I am working on a OCR task to extract information from multiple ID proof documents. One challenge is the orientation of the scanned image. The need is to fix the orientation of the scanned image of PAN, Aadhaar, Driving License or any ID proof.

Already tried all suggested approaches on Stackoverflow and other forums such as OpenCV minAreaRect, Hough Lines Transforms, FFT, homography, tesseract osd with psm 0. None are working.

The logic should return the angle of the text direction - 0, 90 and 270 degrees. Attached are the images of 0, 90 and 270 degrees. This is not about determining the skewness.enter image description here

nathancy
  • 42,661
  • 14
  • 115
  • 137
Ravi
  • 135
  • 1
  • 5
  • The straightforward approach would be to apply optical character recognition to the 4 rotated images and keep that featuring the word "India", or striking the best score using some sort of tests on the segmented string. The libraries opencv, numpy, Image and pytesseract could be considered to this end. Could you post a minimal code displaying what you have tried? – francis Sep 19 '19 at 18:45
  • @francis, Thanks for the comment and suggestions. Due to the char limit of comments and for brevity, am posting code snippets as comments individually below, for some reason the code is show as plain text – Ravi Sep 20 '19 at 03:44
  • This is with pytesseract, intention is to ignore abstract the orientation and let tesseract handle it implicitly, didn't work quite well: config = ('stdout --psm 0 --oem 0 -l osd -c min_characters_to_try=5') imgPath = sys.argv[1] img = cv2.imread(imgPath) text = pytesseract.image_to_osd(img, config=config) print(text) – Ravi Sep 20 '19 at 03:46
  • This is with HOG: im = cv2.imread(imgPath) im = np.float32(im) / 255.0 gx = cv2.Sobel(im, cv2.CV_32F, 1, 0, ksize=1) gy = cv2.Sobel(im, cv2.CV_32F, 0, 1, ksize=1) mag, angle = cv2.cartToPolar(gx, gy, angleInDegrees=True) print(angle[0]) – Ravi Sep 20 '19 at 03:49
  • With Hough Line Transforms: img_edges = cv2.Canny(img_before, 100, 200, apertureSize=3) lines = cv2.HoughLinesP(img_edges, 1, math.pi / 180.0, 100, minLineLength=100, maxLineGap=5) angles = [] for x1, y1, x2, y2 in lines[0]: cv2.line(img_before, (x1, y1), (x2, y2), (255, 0, 0), 3) angle = math.degrees(math.atan2(y2 - y1, x2 - x1)) angles.append(angle) median_angle = np.median(angles) #print(median_angle) print("Angle is {}".format(median_angle)) – Ravi Sep 20 '19 at 03:50

1 Answers1

15

Here's an approach based on the assumption that the majority of the text is skewed onto one side. The idea is that we can determine the angle based on the where the major text region is located


After converting to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image

enter image description here

From here we find contours and filter using contour area to remove the small noise particles and the large border. We draw any contours that pass this filter onto a mask

enter image description here

To determine the angle, we split the image in half based on the image's dimension. If width > height then it must be a horizontal image so we split in half vertically. if height > width then it must be a vertical image so we split in half horizontally

enter image description here enter image description here

Now that we have two halves, we can use cv2.countNonZero() to determine the amount of white pixels on each half. Here's the logic to determine angle:

if horizontal
    if left >= right 
        degree -> 0
    else 
        degree -> 180
if vertical
    if top >= bottom
        degree -> 270
    else
        degree -> 90

left 9703

right 3975

Therefore the image is 0 degrees. Here's the results from other orientations

enter image description here enter image description here enter image description here enter image description here

left 3975

right 9703

We can conclude that the image is flipped 180 degrees

Here's results for vertical image. Note since its a vertical image, we split horizontally

enter image description here enter image description here enter image description here enter image description here

enter image description here

top 3947

bottom 9550

Therefore the result is 90 degrees

import cv2
import numpy as np

def detect_angle(image):
    mask = np.zeros(image.shape, dtype=np.uint8)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    adaptive = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,15,4)

    cnts = cv2.findContours(adaptive, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]

    for c in cnts:
        area = cv2.contourArea(c)
        if area < 45000 and area > 20:
            cv2.drawContours(mask, [c], -1, (255,255,255), -1)

    mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
    h, w = mask.shape
    
    # Horizontal
    if w > h:
        left = mask[0:h, 0:0+w//2]
        right = mask[0:h, w//2:]
        left_pixels = cv2.countNonZero(left)
        right_pixels = cv2.countNonZero(right)
        return 0 if left_pixels >= right_pixels else 180
    # Vertical
    else:
        top = mask[0:h//2, 0:w]
        bottom = mask[h//2:, 0:w]
        top_pixels = cv2.countNonZero(top)
        bottom_pixels = cv2.countNonZero(bottom)
        return 90 if bottom_pixels >= top_pixels else 270

if __name__ == '__main__':
    image = cv2.imread('1.png')
    angle = detect_angle(image)
    print(angle)
nathancy
  • 42,661
  • 14
  • 115
  • 137
  • 1
    thanks for the suggestion and code. It is very interesting idea. Will try out and update. Am working with multiple ID documents and the presence of text areas would vary, might have to detect the type of document and tweak the logic a bit. – Ravi Sep 20 '19 at 03:55
  • 1
    I've tried out with few sample images and the approach seems to be working. Thanks again for the idea – Ravi Sep 20 '19 at 07:17