Python + OpenCV: OCR Image Segmentation

Question

I am trying to do OCR from this toy example of Receipts. Using Python 2.7 and OpenCV 3.1.

Grayscale + Blur + External Edge Detection + Segmentation of each area in the Receipts (for example "Category" to see later which one is marked -in this case cash-).

I find complicated when the image is "skewed" to be able to properly transform and then "automatically" segment each segment of the receipts.

Example:

Any suggestion?

The code below is an example to get until the edge detection, but when the receipt is like the first image. My issue is not the Image to text. Is the pre-processing of the image.

Any help more than appreciated! :)

import os;
os.chdir() # Put your own directory

import cv2 
import numpy as np

image = cv2.imread("Rent-Receipt.jpg", cv2.IMREAD_GRAYSCALE)

blurred = cv2.GaussianBlur(image, (5, 5), 0)

#blurred  = cv2.bilateralFilter(gray,9,75,75)

# apply Canny Edge Detection
edged = cv2.Canny(blurred, 0, 20)

#Find external contour

(_,contours, _) = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

R. S. Nikhil Krishna · Accepted Answer · 2016-11-17T11:38:06.653

A great tutorial on the first step you described is available at pyimagesearch (and they have great tutorials in general)

In short, as described by Ella, you would have to use cv2.CHAIN_APPROX_SIMPLE. A slightly more robust method would be to use cv2.RETR_LIST instead of cv2.RETR_EXTERNAL and then sort the areas, as it should decently work even in white backgrounds/if the page inscribes a bigger shape in the background, etc.

Coming to the second part of your question, a good way to segment the characters would be to use the Maximally stable extremal region extractor available in OpenCV. A complete implementation in CPP is available here in a project I was helping out in recently. The Python implementation would go along the lines of (Code below works for OpenCV 3.0+. For the OpenCV 2.x syntax, check it up online)

import cv2

img = cv2.imread('test.jpg')
mser = cv2.MSER_create()

#Resize the image so that MSER can work better
img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()

regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0)) 

cv2.namedWindow('img', 0)
cv2.imshow('img', vis)
while(cv2.waitKey()!=ord('q')):
    continue
cv2.destroyAllWindows()

This gives the output as

Now, to eliminate the false positives, you can simply cycle through the points in hulls, and calculate the perimeter (sum of distance between all adjacent points in hulls[i], where hulls[i] is a list of all points in one convexHull). If the perimeter is too large, classify it as not a character.

The diagnol lines across the image are coming because the border of the image is black. that can simply be removed by adding the following line as soon as the image is read (below line 7)

img = img[5:-5,5:-5,:]

which gives the output

Thanks @R. S. Nikhil Krishna !! If i use your code for the image (not skewed) of the receipt (see above in the question). I do not get a good segmentation. Question. which parameters should i tune? The convex hull? Thanks in advance! — donpresente, Nov 14 '16 at 10:35
@donpresente I've made the changes. The reason it was not detecting the characters was because the image size was too small. MSER requires significant spacing between characters. That can be achieved by simply resizing the images — R. S. Nikhil Krishna, Nov 17 '16 at 11:38
Nikhil Krishna. I think we have a winner! :) any other advice on the segmentation? Because a "hand made" model could require to divde each character individually right? Should i force a grid on the text? — donpresente, Nov 17 '16 at 12:55
By a hand made model, do you mean manually tuned parameters? And a grid might be slightly problematic because the characters are not equal in size. — R. S. Nikhil Krishna, Nov 18 '16 at 07:38

score 7 · Answer 2 · edited May 23 '17 at 12:34

7

The option on the top of my head requires the extractions of 4 corners of the skewed image. This is done by using cv2.CHAIN_APPROX_SIMPLE instead of cv2.CHAIN_APPROX_NONE when finding contours. Afterwards, you could use cv2.approxPolyDP and hopefully remain with the 4 corners of the receipt (If all your images are like this one then there is no reason why it shouldn't work).

Now use cv2.findHomography and cv2.wardPerspective to rectify the image according to source points which are the 4 points extracted from the skewed image and destination points that should form a rectangle, for example the full image dimensions.

Here you could find code samples and more information: OpenCV-Geometric Transformations of Images

Also this answer may be useful - SO - Detect and fix text skew

EDIT: Corrected the second chain approx to cv2.CHAIN_APPROX_NONE.

edited May 23 '17 at 12:34

Community

1
1

answered Nov 11 '16 at 14:46

Elia

762
1
12
28

thanks! and then how can you segment the text in the rectify image? (this is part of the question) – donpresente Nov 12 '16 at 13:00
1

@donpresente You wrote "My issue is not the Image to text. Is the pre-processing of the image." Anyway, I am not capable to contribute much in the OCR part. – Elia Nov 12 '16 at 13:34
pre-processing for me would include image segmentation. I think the system will send the 50 points to you if no additional answer. Question, if you dont have contour, how your solution works? – donpresente Nov 12 '16 at 17:08
Pre-processing does not generally include image segmentation. In this specific case, the image segmentation is the main processing step. Great answer @Elia! – pzp Nov 17 '16 at 20:27

score 3 · Answer 3 · answered Oct 02 '19 at 22:31

Preprocessing the image by converting the desired text in the foreground to black while turning unwanted background to white can help to improve OCR accuracy. In addition, removing the horizontal and vertical lines can improve results. Here's the preprocessed image after removing unwanted noise such as the horizontal/vertical lines. Note the removed border and table lines

import cv2

# Load in image, convert to grayscale, and threshold
image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Find and remove horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (35,2))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(thresh, [c], -1, (0,0,0), 3)

# Find and remove vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,35))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
    cv2.drawContours(thresh, [c], -1, (0,0,0), 3)

# Mask out unwanted areas for result
result = cv2.bitwise_and(image,image,mask=thresh)
result[thresh==0] = (255,255,255)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()

Dravidian · Answer 4 · 2022-02-17T20:05:48.953

Try using Stroke Width Transform. Python 3 implementation of the algorithm is present here at SWTloc

EDIT : v2.0.0 onwards

Install the Library

pip install swtloc

Transform The Image

import swtloc as swt

imgpath = 'images/path_to_image.jpeg'
swtl = swt.SWTLocalizer(image_paths=imgpath)
swtImgObj = swtl.swtimages[0]
# Perform SWT Transformation with numba engine
swt_mat = swtImgObj.transformImage(text_mode='lb_df', gaussian_blurr=False, 
                                   minimum_stroke_width=3, maximum_stroke_width=12,
                                   maximum_angle_deviation=np.pi/2)

Localize Letters

localized_letters = swtImgObj.localizeLetters(minimum_pixels_per_cc=10,
                                              localize_by='min_bbox')

Localize Words

localized_words =  swtImgObj.localizeWords(localize_by='bbox')

There are multiple parameters in the of the .transformImage, .localizeLetters and .localizeWords function sthat you can play around with to get the desired results.

Full Disclosure : I am the author of this library

I like this library, fascinating stuff, thanks for sharing this! — Yufrend, Feb 10 '22 at 15:33

Python + OpenCV: OCR Image Segmentation

4 Answers4

Install the Library

Transform The Image

Localize Letters

Localize Words

Linked