I have a batch of images which I would like to scan. Some of them have got a horizontal line crossing the characters that have to be scanned, which would look like this:
I have made a program that is able to remove the horizontal line:
import cv2
import numpy as np
img = cv2.imread('image.jpg',0)
# Applies threshold and inverts the image colors
(thresh, im_bw) = cv2.threshold(img, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
im_wb = (255-im_bw)
# Line parameters
minLineLength = 100
maxLineGap = 10
color = 255
size = 2
# Substracts the black line
lines = cv2.HoughLinesP(im_wb,1,np.pi/180,minLineLength,maxLineGap)[0]
for x1,y1,x2,y2 in lines:
cv2.line(img,(x1,y1),(x2,y2),color,size)
cv2.imshow('clean', img)
This returns the image below:
So, do you have any idea of how to make OCR to these characters that have the white line crossing them? Would you make a different approach than the one stated?
Please ask any questions you have if something is not clear. Thank you.