Here's an example image ->
I would like to extract text that has text-decoration/styling of strikethrough.
So for the above image I would like to extract -
de location
How would I do this ?
Here's what I have so far using OpenCV and python :
import cv2
import numpy as np
import matplotlib.pyplot as plt
im = cv2.imread(<image>)
kernel = np.ones((1,44), np.uint8)
morphed = cv2.morphologyEx(im, cv2.MORPH_CLOSE, kernel)
plt.imshow(morphed)
This gives me the horizontal lines ->
I am new to image processing and hence having a difficult time isolating only the text that has strikethroughs.
Bonus -> Along with the strikethrough text, I would like to also extract neighboring text so that I can correctly style/mark the strikethrough text information back along with other text.
UPDATE 1 : Based on the first answer I did the following : -
import cv2
# Load image, convert to grayscale, Otsu's threshold
image = cv2.imread('image.png')
result = image.copy()
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV +
cv2.THRESH_OTSU)[1]
# Detect horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(40,1))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN,
horizontal_kernel, iterations=10)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(result, [c], -1, (36,255,12), 2)
plt.imshow(result)
I was able to get this image -
I tried playing with the values for the horizontal kernel but no luck.
UPDATE 2: I modified the above snippet further and got this -
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Load image, convert to grayscale, Otsu's threshold
result = image.copy()
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
kernel = np.ones((4,2),np.uint8)
erosion = cv2.erode(thresh,kernel,iterations = 1)
dilation = cv2.dilate(thresh,kernel,iterations = 1)
trans = dilation
# plt.imshow(erosion)
# Detect horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (8,1))
detect_horizontal = cv2.morphologyEx(trans, cv2.MORPH_OPEN, horizontal_kernel, iterations=10)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(result, [c], -1, (36,255,12), 2)
plt.imshow(result)
I was able to get this image -
And this solution applies to my other image types as well -
This is not a 100% accuracy solution (failed to get the de
strikethrough text) but I like the performance so far.
Now, I am struggling with how to check if the neighboring pixels are black or white to isolate the strikethrough.