so I am using opencv to do template matching like below. I constantly need to fiddle with the visual similarity #THRESHOLD
, because it fails to discover matches sometimes or it returns way too many matches. It's a trial and error until it matches exactly 1 element in a position in a document. I'm wonder if there is any way to automate this somehow.
the image.png file is a picture of a pdf document. the template.png file is a picture of paragraph. My goal is to discover all the paragraphs in the pdf document and I want to know what neural network is useful here.
import cv2
import numpy as np
img = cv2.imread("image.png");
gimg = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
template = cv2.imread("template.png", cv2.IMREAD_GRAYSCALE);
w, h = template.shape[::-1]
result = cv2.matchTemplate(gimg, template, cv2.TM_CCOEFF_NORMED)
loc = np.where(result >= 0.36) #THRESHOLD
print(loc)
for pt in zip(*loc[::-1]):
cv2.rectangle(img, pt, (pt[0] + w, pt[1] + h), (0,255,0), 3)
cv2.imwrite("output.png", img)
so for instance, it will search for every #THRESHOLD
value from 0
to 1.0
and return a threshold value that returns a single rectangle match (draws green box above) in the image.
However, I can't help but feel this is very exhuastive, or is there a smarter way to find out what the threshold value is?