2

I want to OCR some scanned forms (filled by hand). It is the first time I do something serious using computer vision. So far I'm able to locate the squares containing digits for a date field:

enter image description here

Looking at the example handwritten digit dataset that comes with OpenCV, I see digits are centralized and resized to (20, 20):

enter image description here

Since this may be a fairly common problem, I'm wondering if the algorithm is already implemented in OpenCV (or numpy, scipy, etc) so I don't have to reinvent the wheel.

The question is: is there a built-in pipeline in Python in order to normalize the samples?

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
  • What do you mean by "normalize"? You mean resize and center? You've already done the hard part---got the contours! Just find the `cv2.boundingRect()` around the contours, maybe increase the size of the box by 1 or 2 px in every direction if you don't want any white to touch the border, and then rescale that ROI to the size you want. – alkasm Jul 04 '17 at 21:17
  • What I would do is to find a bounding box for each digit, crop that part, and then resize to desired proportions – DarkCygnus Jul 04 '17 at 21:20
  • Yes, finding the bounding box and scaling is easy enough; there is also the aspect ratio to consider. Many years ago I wrote an algorithm to generate an affine transform matrix that could take care of the aspect ratio and size simultaneously but I forgot how I did it (used it for cropping user-uploaded images to a standard size in PIL). Unfortunately that code is lost. – Paulo Scardine Jul 04 '17 at 21:30

2 Answers2

1

A built in pipeline not sure, but you could implement your own, given you already have the contours, by doing the following (based on my comment):

Obtain Bounding rectangle of contour (therefore centering on it) and crop that part :

x,y,w,h = cv2.boundingRect(cnt)
imgCrop = img[x:(x+w), y:(y+h)]

Resize image to desired size (say 20 x 20):

imgResized = cv2.resize(imgCrop, (20,20))   

You can also resize axes by a specific ratio like:

imgResized = cv2.resize(imgCrop, (0,0), fx=0.5, fy=0.5)  

or with scipy (as suggested in this question):

imgResized = scipy.misc.imresize(imgCrop, 0.5)  

Bonus: Check this great tutorial on basic image manipulation with Python and OpenCV, where they show other way to resize taking into account aspect ratio and interpolations for better results, extracting from it:

imgResized = cv2.resize(imgCrop, (20,20), interpolation = cv2.INTER_AREA)
DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
  • My mind was stuck with aesthetics but probably I can ignore that. I guess I would have no problem if I resize all inputs to 20 x 20 and ignore aspect ratio - as long as I do this both with the training set and the test set it should be OK. What do you think? – Paulo Scardine Jul 04 '17 at 22:09
  • I think that yes, you should use same input image sizes on training, cross-validate, and test, as most tools like Neural Networks (you are doing MNIST I pressume) require inputs of same size. However you should not completely ignore the aspect ratio. If you get much deformation that could affect your training performance in general as some digits (like number 1) are narrower than others. – DarkCygnus Jul 04 '17 at 22:17
  • To prevent this, you could check that the resulting bounding box aspect ratio (w/h) is close to `1.0`. in case you have significant more width or height you could add more pixels of it when cropping the image (instead of `img[x:(x+w), y:(y+h)]` you could, say, add 4 more width pixels by doing `img[((x-2):(x+w+2), y:(y+h)]` ). The other way if you want more height, just be careful not to get out of bounds. – DarkCygnus Jul 04 '17 at 22:25
0

I ended up using this function:

def norm_digit(im):
    h, w = im.shape
    if h > w:
        top, left = round(h * 0.1), round((1.2 * h - w) / 2)
    else:
        top, left = round(w * 0.1), round((1.2 * w - h) / 2)

    return cv2.resize(
        cv2.copyMakeBorder(im, top, top, left, left, cv2.BORDER_CONSTANT), 
        (20, 20)
    )

Input is a image already cropped to the bounding box of the digit contour. There are some corner cases it doesn't cover but looks like this may be good enough.

Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153