11

I have an image that is of a text written on a spiral notebook paper. the paper has horizontal lines. I would like to remove the horizontal lines from the image.

While googling I found a solution that I thought would work: Extract horizontal and vertical lines by using morphological operations The solution was in C++ so I converted it to Python. It works well on the sample image provided in that solution however, it does not seem to work for my images.

While running it on my image I get these results:

Original Image

Resulting Image

Below is the Python code that I translated from C++

 #cpp code converted from     http://docs.opencv.org/3.2.0/d1/dee/tutorial_moprh_lines_detection.html

import cv2
import numpy as np

img = cv2.imread("original.jpg")
img=cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

img = cv2.bitwise_not(img)
th2 = cv2.adaptiveThreshold(img,255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,15,-2)
cv2.imshow("th2", th2)
cv2.imwrite("th2.jpg", th2)
cv2.waitKey(0)
cv2.destroyAllWindows()

horizontal = th2
vertical = th2
rows,cols = horizontal.shape
horizontalsize = cols / 30
horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontalsize,1))
horizontal = cv2.erode(horizontal, horizontalStructure, (-1, -1))
horizontal = cv2.dilate(horizontal, horizontalStructure, (-1, -1))
cv2.imshow("horizontal", horizontal)
cv2.imwrite("horizontal.jpg", horizontal)
cv2.waitKey(0)
cv2.destroyAllWindows()

verticalsize = rows / 30
verticalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (1, verticalsize))
vertical = cv2.erode(vertical, verticalStructure, (-1, -1))
vertical = cv2.dilate(vertical, verticalStructure, (-1, -1))
cv2.imshow("vertical", vertical)
cv2.imwrite("vertical.jpg", vertical)
cv2.waitKey(0)
cv2.destroyAllWindows()

vertical = cv2.bitwise_not(vertical)
cv2.imshow("vertical_bitwise_not", vertical)
cv2.imwrite("vertical_bitwise_not.jpg", vertical)
cv2.waitKey(0)
cv2.destroyAllWindows()

#step1
edges = cv2.adaptiveThreshold(vertical,255, cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,3,-2)
cv2.imshow("edges", edges)
cv2.imwrite("edges.jpg", edges)
cv2.waitKey(0)
cv2.destroyAllWindows()

#step2
kernel = np.ones((2, 2), dtype = "uint8")
dilated = cv2.dilate(edges, kernel)
cv2.imshow("dilated", dilated)
cv2.imwrite("dilated.jpg", dilated)
cv2.waitKey(0)
cv2.destroyAllWindows()

# step3
smooth = vertical.copy()

#step 4
smooth = cv2.blur(smooth, (4,4))
cv2.imshow("smooth", smooth)
cv2.imwrite("smooth.jpg", smooth)
cv2.waitKey(0)
cv2.destroyAllWindows()

#step 5
(rows, cols) = np.where(img == 0)
vertical[rows, cols] = smooth[rows, cols]

cv2.imshow("vertical_final", vertical)
cv2.imwrite("vertical_final.jpg", vertical)
cv2.waitKey(0)
cv2.destroyAllWindows()

I've tried ImageMagik on my original image as well in an effort to remove lines.

I get better results with ImageMagik but still not completely accurate.

convert original -morphology close:3 "1x5: 0,1,1,1,0" original_im.jpg
Anthony
  • 33,838
  • 42
  • 169
  • 278
  • 1
    Your lines don't seem to be straight. I'd go for continuity detection from one side to the other side. – smttsp Feb 25 '17 at 21:45
  • Any pointers on how to achieve that? – Anthony Feb 25 '17 at 21:56
  • I haven't implemented anything like this before but I can write a basic pseudo code for you. I bet there are many more efficient ways to do. I assume your input data is always the similar, right? – smttsp Feb 25 '17 at 22:02
  • Yeah mostly it is written on paper with lines. Sometimes the lines are horizontal and other times vertical. – Anthony Feb 25 '17 at 22:03
  • 1
    I'd also consider looking st the probabilistic hough transform and removing lines of a certain length and certain orientations. I'll try to write an answer later tonight. – rayryeng Feb 25 '17 at 23:28
  • @rayryeng I tried hough using this answer http://stackoverflow.com/questions/33838156/python-opencv-houghlinesp-fails-to-detect-lines/33839195#33839195 on my original image (https://www.dropbox.com/s/hhxr1pybt76l9sg/vertical_lines.jpg?dl=0) to remove vertical lines and got this result (https://www.dropbox.com/s/chn1hp74q8u8tvl/3HoughLines.png?dl=0) I'm wondering if you have any thoughts on improving the hough to find the vertical lines and remove them? – Anthony Feb 26 '17 at 20:44
  • Consider accepting the answer if you think it was helpful. – Michał Gacka Mar 10 '17 at 11:25

1 Answers1

6

Your case is less trivial than the one provided in the tutorial that you have based your solution on. With this approach you will not be able to filter the lines in 100%, because of the fact that horizontal parts of the characters will sometimes be treated as lines.

Depends on your expectations (which you haven't really specified) and specifically the accuracy that you expect, you might want to try to find the characters instead of finding the line. That should provide you with more robustness.

Regarding your code, by adding few lines of code right after finding horizontal lines on the image (before verticalsize = rows / 30 line of code), you can get some results. I've worked on a half size image.

Result with horizontalsize = int(cols/30)

Result with horizontalsize = int(cols/15)

Again, I'm stressing that those will never be accurate with that approach in your case. Here's the snippet:

#inverse the image, so that lines are black for masking
horizontal_inv = cv2.bitwise_not(horizontal)
#perform bitwise_and to mask the lines with provided mask
masked_img = cv2.bitwise_and(img, img, mask=horizontal_inv)
#reverse the image back to normal
masked_img_inv = cv2.bitwise_not(masked_img)
cv2.imshow("masked img", masked_img_inv)
cv2.imwrite("result2.jpg", masked_img_inv)
cv2.waitKey(0)
cv2.destroyAllWindows()

Try playing with horizontalsize if the images I provided are somewhat satisfactory. I've also used int conversion, since that's what the getStructuringElement function expects: horizontalsize = int(cols / 30).

You can also try some smoothing and morphology on the result. That should make the characters a little bit more readable.

Michał Gacka
  • 2,935
  • 2
  • 29
  • 45
  • I am interested in the approach you mentioned about finding the characters. Would it be possible to find the characters, crop them out, and put them on a new white background? How should I go about finding the characters? Contour bounding? – Anthony Feb 25 '17 at 21:59
  • 1
    Well now when I gave it more time (I thought you might accept the results that the morphologies provided), it will be pretty hard to retrieve the characters. I'm thinking you might try detecting each of the characters separately using machine learning but that seems like a huge overkill. Another idea I have for filtering the lines is to try to find contours and then filter them based on their length. Pseudo code would go something like this: a) morphological closing b) vertical sobel c) find contours d) filter them based on their length – Michał Gacka Feb 25 '17 at 22:13
  • Thanks. I am trying to replicate your results. Where should the snippet you provided go? I changed `horizontalsize = int(cols / 15)` but not sure where your snippet should go? before `cv2.imshow("horizontal", horizontal)` in my code? – Anthony Feb 25 '17 at 22:32
  • FWIW, I've updated the question with an ImageMagik command that returns somewhat better results but still not perfect – Anthony Feb 25 '17 at 22:52
  • additionally, what if the lines are vertical instead of horizontal? like this https://www.dropbox.com/s/hhxr1pybt76l9sg/vertical_lines.jpg?dl=0 – Anthony Feb 26 '17 at 12:19
  • Here's the full code for horizontal lines: https://codeshare.io/aJbj4K. For vertical you do the same, but using the image called "vertical" in your code. Remember that doing some additional morphological operations and smoothing on the result should make it bit better than the result I provided. – Michał Gacka Feb 26 '17 at 21:29
  • So, the code works with the piano notes image. But it doesn't work with images like the these... [image 1](https://i.stack.imgur.com/CvxAO.png) [image 2](https://i.stack.imgur.com/DddCL.png) [image 3](https://i.stack.imgur.com/t9P7O.png) How can I remove the horizontal lines from these ones while keep the numbers/characters ? – lucians Sep 18 '17 at 14:52
  • @m3h0w I used the code and the result is poor. [Take a look](https://stackoverflow.com/questions/46274961/removing-horizontal-lines-in-image-opencv-pyhton-matplotlib?noredirect=1#comment79515831_46274961) – lucians Sep 19 '17 at 09:42
  • @Link I don't see anywhere in that thread that you used the code with the modification I suggested. – Michał Gacka Sep 19 '17 at 09:45
  • In the second part of code, there is your which you indicated above. If you're talking about codeshare, well, it isn't available...I can't see it – lucians Sep 19 '17 at 09:49
  • @Link you added the code in the wrong place. My answer explicitly points that it should be injected "before verticalsize = rows / 30 line of code". Please don't waste people's time on doing coding for you and dig into what the code you're trying to use is actually doing. – Michał Gacka Sep 19 '17 at 10:00