3

I pieced together a quick algorithm in python to get the input boxes from a handwritten invoice.

# some preprocessing
img = np.copy(orig_img)
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
img = cv2.GaussianBlur(img,(5,5),0)
_, img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

# get contours
contours, hierarchy = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for i, cnt in enumerate(contours):
    approx = cv2.approxPolyDP(cnt, 0.01*cv2.arcLength(cnt,True), True)
    if len(approx) == 4:
        cv2.drawContours(orig_img, contours, i, (0, 255, 0), 2)

enter image description here

It fails to get the 2nd one in this example because the handwriting crosses the box boundary.

Note that this picture could be taken with a mobile phone, so aspect ratios may be a little funny.

So, what are some neat recipes to get around my problem?

And as a bonus. These boxes are from an A4 page with a lot of other stuff going on. Would you recommend a whole different approach to getting the handwritten numbers out?

EDIT

This might be interesting. If I don't filter for 4 sided polys, I get the contours but they go all around the hand-drawn digit. Maybe there's a way to make contours have water-like cohesion so that they pinch off when they get close to themselves?

enter image description here

FURTHER EDIT

Here is the original image without bounding boxes drawn on

enter image description here

nathancy
  • 42,661
  • 14
  • 115
  • 137
Alexander Soare
  • 2,825
  • 3
  • 25
  • 53
  • 1
    Please add the original image as well without the green bounding box drawn over it. – Vardan Agarwal Dec 31 '19 at 20:39
  • 1
    you could try to crop away some convexity defects before approPolyDp, but not sure about good heuristics for your case... https://stackoverflow.com/questions/35226993/how-to-crop-away-convexity-defects – Micka Dec 31 '19 at 20:47

3 Answers3

2

Here's a potential solution:

  1. Obtain binary image. We load the image, convert to grayscale, apply a Gaussian blur, and then Otsu's threshold

  2. Detect horizontal lines. We create a horizontal kernel and draw detected horizontal lines onto a mask

  3. Detect vertical lines. We create a vertical kernel and draw detected vertical lines onto a mask

  4. Perform morphological opening. We create a rectangular kernel and perform morph opening to smooth out noise and separate any connected contours

  5. Find contours, draw rectangle, and extract ROI. We find contours and draw the bounding rectangle onto the image


Here's a visualization of each step:

Binary image

enter image description here

Detected horizontal and vertical lines drawn onto a mask

enter image description here

Morphological opening

enter image description here

Result

enter image description here

Individual extracted saved ROI

enter image description here

Note: To extract only the hand written numbers/letters out of each ROI, take a look at a previous answer in Remove borders from image but keep text written on borders (preprocessing before OCR)

Code

import cv2
import numpy as np

# Load image, grayscale, blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Find horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (50,1))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(mask, [c], -1, (255,255,255), 3)

# Find vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,50))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(mask, [c], -1, (255,255,255), 3)

# Morph open
mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7,7))
opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=1)

# Draw rectangle and save each ROI
number = 0
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
    ROI = original[y:y+h, x:x+w]
    cv2.imwrite('ROI_{}.png'.format(number), ROI)
    number += 1

cv2.imshow('thresh', thresh)
cv2.imshow('mask', mask)
cv2.imshow('opening', opening)
cv2.imshow('image', image)
cv2.waitKey()
nathancy
  • 42,661
  • 14
  • 115
  • 137
  • Thank you. Not the first time you've helped me out @nathancy. This was very useful for exploration, although in the end I chose a different path. See it here https://stackoverflow.com/a/59555105/4391249 – Alexander Soare Jan 01 '20 at 18:38
2

Since the squares have a quite straight lines, it's good to use Hough transform:

1- Make the image grayscale, then do an Otsu threshold on it, then reverse the binary image

2- Do Hough transform (HoughLinesP) and draw the lines on a new image

3- With findContours and drawContours, make the 3 roi clean

4- Erode the final image a little to make the boxes neater

enter image description here

I wrote the code in C++, it's easily convertible to python:

Mat img = imread("D:/1.jpg", 0);
threshold(img, img, 0, 255, THRESH_OTSU);
imshow("Binary image", img);

img = 255 - img;
imshow("Reversed binary image", img);

Mat img_1 = Mat::zeros(img.size(), CV_8U);
Mat img_2 = Mat::zeros(img.size(), CV_8U);

vector<Vec4i> lines;
HoughLinesP(img, lines, 1, 0.1, 95, 10, 1);
for (size_t i = 0; i < lines.size(); i++)
    line(img_1, Point(lines[i][0], lines[i][1]), Point(lines[i][2], lines[i][3]), 
        Scalar(255, 255, 255), 2, 8);

imshow("Hough Lines", img_1);

vector<vector<Point>> contours;
findContours(img_1,contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE);
for (int i = 0; i< contours.size(); i++)
    drawContours(img_2, contours, i, Scalar(255, 255, 255), -1);

imshow("final result after drawcontours", img_2);    waitKey(0);
MeiH
  • 1,763
  • 11
  • 17
  • Thank you. This was very useful for exploration, although in the end I chose a different path. See it here https://stackoverflow.com/a/59555105/4391249 – Alexander Soare Jan 01 '20 at 18:37
1

Thank you to those who shared solutions. I ended up taking a slightly different path in the end.

  1. Grayscale, Gaussian Blur, Otsu threshold
  2. Get contours
  3. Filter contours by aspect ratio and extent
  4. Return the minimum upright bounding box of the contour.
  5. Remove any bounding boxes that encapsulate smaller bounding boxes (because you get two boxes, one for the inside contour, and one for the outside).

Here's the code if anyone's interested (except for step 5 - that was just basic numpy manipulation)

orig_img = cv2.imread('example0.jpg')

img = np.copy(orig_img)
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
img = cv2.GaussianBlur(img,(5,5),0)
_, img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

contours, hierarchy = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

boxes = list()

for i, cnt in enumerate(contours):
    x,y,w,h = cv2.boundingRect(cnt)
    aspect_ratio = float(w)/h
    area = cv2.contourArea(cnt)
    rect_area = w*h
    extent = float(area)/rect_area
    if abs(aspect_ratio - 1) < 0.1 and extent > 0.7:
        boxes.append((x,y,w,h))

And here's an example of what came out when cutting out the boundary boxes from the original image.

enter image description here

Alexander Soare
  • 2,825
  • 3
  • 25
  • 53