15

I have the below image of a single drivers license, I want to extract information about the drivers license, name, DOB etc. My thought process is to find a way to group them line by line, and crop out the single rectangle that contains name, license, etc for eng and ara. But I have failed woefully.

enter image description here

import cv2
import os
import numpy as np

scan_dir = os.path.dirname(__file__)
image_dir = os.path.join(scan_dir, '../../images')


class Loader(object):
    def __init__(self, filename, gray=True):
        self.filename = filename
        self.gray = gray
        self.image = None

    def _read(self, filename):
        rgba = cv2.imread(os.path.join(image_dir, filename))

        if rgba is None:
            raise Exception("Image not found")

        if self.gray:
            gray = cv2.cvtColor(rgba, cv2.COLOR_BGR2GRAY)

        return gray, rgba


    def __call__(self):
        return self._read(self.filename)


class ImageScaler(object):

    def __call__(self, gray, rgba, scale_factor = 2):
        img_small_gray = cv2.resize(gray, None, fx=scale_factor, fy=scale_factor, interpolation=cv2.INTER_AREA)
        img_small_rgba = cv2.resize(rgba, None, fx=scale_factor, fy=scale_factor, interpolation=cv2.INTER_AREA)


        return img_small_gray, img_small_rgba



class BoxLocator(object):
    def __call__(self, gray, rgba):
        # image_blur = cv2.medianBlur(gray, 1)
        ret, image_binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
        image_not = cv2.bitwise_not(image_binary)

        erode_kernel = np.ones((3, 1), np.uint8)
        image_erode = cv2.erode(image_not, erode_kernel, iterations = 5)

        dilate_kernel = np.ones((5,5), np.uint8)
        image_dilate = cv2.dilate(image_erode, dilate_kernel, iterations=5)


        kernel = np.ones((3, 3), np.uint8)
        image_closed = cv2.morphologyEx(image_dilate, cv2.MORPH_CLOSE, kernel)
        image_open = cv2.morphologyEx(image_closed, cv2.MORPH_OPEN, kernel)

        image_not = cv2.bitwise_not(image_open)
        image_not = cv2.adaptiveThreshold(image_not, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 15, -2)

        image_dilate = cv2.dilate(image_not, np.ones((2, 1)), iterations=1)
        image_dilate = cv2.dilate(image_dilate, np.ones((2, 10)), iterations=1)

        image, contours, heirarchy = cv2.findContours(image_dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

        for contour in contours:
            x, y, w, h = cv2.boundingRect(contour)
            # if w > 30 and h > 10:
            cv2.rectangle(rgba, (x, y), (x + w, y + h), (0, 0, 255), 2)

        return image_dilate, rgba



def entry():
    loader = Loader('sample-004.jpg')
    # loader = Loader('sample-004.jpg')
    gray, rgba = loader()

    imageScaler = ImageScaler()
    image_scaled_gray, image_scaled_rgba = imageScaler(gray, rgba, 1)

    box_locator = BoxLocator()
    gray, rgba = box_locator(image_scaled_gray, image_scaled_rgba)

    cv2.namedWindow('Image', cv2.WINDOW_NORMAL)
    cv2.namedWindow('Image2', cv2.WINDOW_NORMAL)

    cv2.resizeWindow('Image', 600, 600)
    cv2.resizeWindow('Image2', 600, 600)

    cv2.imshow("Image2", rgba)
    cv2.imshow("Image", gray)

    cv2.moveWindow('Image', 0, 0)
    cv2.moveWindow('Image2', 600, 0)

    cv2.waitKey()
    cv2.destroyAllWindows()

When I run the above code I get the below segmentation. Which is not close to what I want

enter image description here

But below is what I want to achieve, for all input licenseenter image description here

George
  • 3,757
  • 9
  • 51
  • 86
  • 4
    As the license size is fixed. The eagle and the circle are also fixed. They you can try to find the two anchors, and `calculate` positions of the ROIs. – Kinght 金 Nov 05 '18 at 09:24
  • @Silencer that sounds really complex, I am only a beginner, not sure I have that skills yet – George Nov 05 '18 at 09:28
  • 1
    But I'm not sure you have the skills to directly detect and crop the ROIs yet. – Kinght 金 Nov 05 '18 at 09:30
  • Since the license layout is fixed, what you can do is: Detect the license outline only and then using the size of license you can estimate the position of various bounding boxes. – ZdaR Nov 05 '18 at 09:41
  • @ZdaR Like a template method right? using ratios to always get the positions. – George Nov 05 '18 at 09:44
  • Yeah, it is kind of a very simple technique, whose accuracy can be increased in case of any rotations or affine transformations in input image. Try to implement this and see the results. – ZdaR Nov 05 '18 at 09:53
  • Another approach you can try: since the eagle and the circle on the top right are always the same, you can have them as a template. Then Do a convolution to locate this symbols in the image. Knowing the position of this two symbols you can then know the position of all other elements. –  Nov 07 '18 at 09:36
  • Check [this](https://stackoverflow.com/a/24464968/2571705) template based method. Code is in `c++` but you can use the concept. – dhanushka Nov 07 '18 at 16:38

2 Answers2

10

Off the top of my head, I can think of two approaches:

Approach 1. As mentioned in comments, you can crop the eagle symbol on the top-left and the flag on the top-right, use these as templates and find the two boxes you are interested in, left bottom (small box) and the center (big box) with respect to the position of the found templates. As a start, you can use this:

Template 1

Template 1

Template 2

Template 2

Code:

import numpy as np
import cv2
import matplotlib.pyplot as plt

image = cv2.imread("ID_card.jpg")

template_1 = cv2.imread("template_1.jpg", 0)
w_1, h_1 = template_1.shape[::-1]

template_2 = cv2.imread("template_2.jpg", 0)
w_2, h_2 = template_2.shape[::-1]

res_1 = cv2.matchTemplate(image=image, templ=template_1, method=cv2.TM_CCOEFF)
min_val_1, max_val_1, min_loc_1, max_loc_1 = cv2.minMaxLoc(res_1)

res_2 = cv2.matchTemplate(image=image, templ=template_2, method=cv2.TM_CCOEFF)
min_val_2, max_val_2, min_loc_2, max_loc_2 = cv2.minMaxLoc(res_2)

cv2.rectangle(image, max_loc_1, (max_loc_1[0] + w_1, max_loc_1[1] + h_1), 255, 2)
cv2.rectangle(image, max_loc_2, (max_loc_2[0] + w_2, max_loc_2[1] + h_2), 255, 2)

Result:

Result Template

You can use the centers of the found templates to get the relative position of the required boxes (small and big).

Approach 2. Similar to what you did based on contours, the basic idea is to use morphology to get definitive lines in the bigger box.

Code:

import numpy as np
import cv2
import matplotlib.pyplot as plt

image = cv2.imread("ID_card.jpg")
imgray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

ret, thresh = cv2.threshold(imgray, 150, 255, 0)
# cv2.imwrite("thresh.jpg", thresh)

# Morphological operation
thresh = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, 
cv2.getStructuringElement(cv2.MORPH_RECT, (7, 7)))

im2, contours, heirarchy = cv2.findContours(thresh, cv2.RETR_TREE, 
cv2.CHAIN_APPROX_SIMPLE)

# Sort the contours based on area
cntsSorted = sorted(contours, key=lambda x: cv2.contourArea(x), reverse=True)

approxes = []

for cnt in cntsSorted[1:10]:
    peri = cv2.arcLength(cnt, True)
    # approximate the contour shape
    approx = cv2.approxPolyDP(cnt, 0.04 * peri, True)
    approxes.append(approx)
    if len(approx) == 4:
    # length of 4 means 4 vertices so it should be a quadrilateral
        cv2.drawContours(image, approx, -1, (0, 255, 0), 10)

cv2.imwrite("ID_card_contours.jpg", image)
print(approxes)

Results:

Thresholded image

Thresholded

After Morphological opening

Closed

Final image with the respective corners of the two intended boxes marked with green

Final image

So, this approach is pretty straight forward and I am sure you can do the rest in finding the smaller subsets from the large box. If not, shoot me a comment and I'll be happy to help (basically crop that area from the image, use HoughlinesP and you should be fine. Or, I can see that the smaller subsets are of equal width so you can just crop them based on y coordinates)

PS. Hopefully the "bigger", "smaller" boxes are well understood, apologies for my laziness in not showing what they are in images.

Note: Given only one image, I can't say for sure if it works for all the images in your dataset. You might have to tweak the threshold and morph_open parameters. If you can upload more images, I can try them on.

Courtesy: OpenCV shape detection for detecting shapes in contours.

Rick M.
  • 3,045
  • 1
  • 21
  • 39
  • I tried the template method, it has issues when you provide images of different sizes from the one the original template was extracted. The second image works really well though, For blurrier images it tends to pick more points, but I am able to eliminate them by reducing the number of items to iterate through. I am still playing around with either using houghlineP or just dividing the image by the number of lines. Not sure which will work better. But this is a great answer. – George Nov 14 '18 at 06:48
  • I hope you don't mind me reaching out if I have more questions. – George Nov 14 '18 at 06:52
  • @JamesOkpeGeorge I am glad you find it helpful and I feel that it is even better that you trying out the next steps on your own now. After posting the answer, I tried using HoughP but I figured that since the subsets you are looking for in the detected large box are of equal width and height so you can basically : 1. Crop the big box, 2. Divide the total height (y) by 8 3. Then you can just assign coordinate to the cropped image and you will get what you wanted. – Rick M. Nov 14 '18 at 07:58
  • Another thing I should mention is that with this method you should leave a margin of error, lets say 2-3 pixels. Additionally, if you compare the top two corners with the bottom two, you'll see that they aren't parallel to the card. Although the rotation is really tiny, you can rotate the cropped image w.r.t. the card. May be this will make the results stand out even better. – Rick M. Nov 14 '18 at 08:00
-1

From what I see, the best approach would be to detect the edges of the licence and crop it. Then, when you have the coordinates of the edges, you can calculate the angle from which you have to rotate the image for it to be flat.

From there, you can crop out fixed areas (on predefined pixels coordinates). In thet step, leave a little room for an error (let's say you add 5-10 pixels to each side of the cropping area as an insurance).

Then, you can feed the images to the Tesseract with option --psm 9. That will read the text in the box more accurate than default setting.

I hope this is clear enough and that it helps you :)

Novak
  • 2,143
  • 1
  • 12
  • 22