I am trying to extract bounding boxes from this form image. The Bounding Boxes in my case are all the boxes in the image. My approach was to Find contours, obtain the bounding box, extract the ROI and perform OCR using pytesseract on those ROI's. I am not able to find the right contours. Is my approach the right way or should I try a different solution. Thanks in advance.
My code so far looks as follows
import cv2
import pytesseract
image = cv2.imread('DocOrigin_Government_W2_2014_Red-ScanL.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2) #adaptive
canny = cv2.Canny(thresh, 100, 200)
# Find contours, obtain bounding box, extract and save ROI
ROI_number = 0
cnts = cv2.findContours(canny, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
# x,y,w,h = 37, 625, 309, 28
ROI = thresh[y:y+h,x:x+w]
data = pytesseract.image_to_string(ROI, lang='eng',config='--psm 6')
print(data)
# write contour images to disk
# cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
# cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
# ROI_number += 1