Python: How to OCR characters crossed by a horizontal line

Question

I have a batch of images which I would like to scan. Some of them have got a horizontal line crossing the characters that have to be scanned, which would look like this:

Raw Image

I have made a program that is able to remove the horizontal line:

import cv2
import numpy as np

img = cv2.imread('image.jpg',0)

# Applies threshold and inverts the image colors
(thresh, im_bw) = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
im_wb = (255-im_bw)

# Line parameters
minLineLength = 100
maxLineGap = 10
color = 255
size = 2

# Substracts the black line
lines = cv2.HoughLinesP(im_wb,1,np.pi/180,minLineLength,maxLineGap)[0]
for x1,y1,x2,y2 in lines:
    cv2.line(img,(x1,y1),(x2,y2),color,size) 

cv2.imshow('clean', img)

This returns the image below:

Clean Image

So, do you have any idea of how to make OCR to these characters that have the white line crossing them? Would you make a different approach than the one stated?

Please ask any questions you have if something is not clear. Thank you.

Have you tried writing an algorithm that removes only the portions of the black line outside the character strokes it crosses? I would recommend focusing on that. Once you know the line thickness (assuming it has a consistent thickness), you could check whether there are black pixels above and below the line, and only remove the line one column at a time if the pixels above and below are white. — Rethunk, Dec 15 '16 at 04:48

score 1 · Accepted Answer · answered Dec 15 '16 at 11:24

Following @Rethunk advice, I did the following:

# Line parameters
minLineLength = 100
maxLineGap = 10
color = 255
size = 1

# Substracts the black line
lines = cv2.HoughLinesP(im_wb,1,np.pi/180,minLineLength,maxLineGap)[0]

# Makes a list of the y's located at position x0 and x1
y0_list = []
y1_list = []
for x0,y0,x1,y1 in lines:
    if x0 == 0:
        y0_list.append(y0)
    if x1 == im_wb.shape[1]:
        y1_list.append(y1)

# Calculates line thickness and its half
thick = max(len(y0_list), len(y1_list))
hthick = int(thick/2)

# Initial and ending point of the full line
x0, x1, y0, y1 = (0, im_wb.shape[1], sum(y0_list)/len(y0_list), sum(y1_list)/len(y1_list))

# Iterates all x's and prints makes a vertical line with the desired thickness 
# when the point is surrounded by white pixels
for x in range(x1):
    y = int(x*(y1-y0)/x1) + y0
    if im_wb[y+hthick+1, x] == 0 and im_wb[y-hthick-1, x] == 0:
        cv2.line(img,(x,y-hthick),(x,y+hthick),colour,size) 

cv2.imshow(clean', img)

So, as the HoughLinesP function returns the initial and final point of horizontal lines, I made a list of the y coordinates of the points that are in the begginning and end of the image and thus I am able to know the full line equation (so if it is inclined is valid as well) and I can iterate all its points. For each point, if it is surrounded by white pixels, I remove it. The outcome is the following:

If you have any better idea please tell!

It cames with error `ZeroDivisionError: division by zero` on `x0, x1, y0, y1 = (0, im_wb.shape[1], sum(y0_list)/len(y0_list), sum(y1_list)/len(y1_list))` — lucians, Dec 26 '17 at 01:45

Python: How to OCR characters crossed by a horizontal line

1 Answers1

Linked