I want to detect the percentage of the area the text blocks are using on an image. The idea is to reject images that have more than 40% text on it. I saw a very informative and detailed post in detecting text here. The link is having C++ used. I think I can get the idea to use it in python.
However, am not sure what would be the best way to measure the percentage of the area it is using. Is there any implementation of something similar that we could use? I am just getting started with Cv.
I am getting my text as below using python. This is a code I found to work from a forum online.
import cv2
def captch_ex(file_name):
img = cv2.imread(file_name)
img_final = cv2.imread(file_name)
img2gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(img2gray, 180, 255, cv2.THRESH_BINARY)
image_final = cv2.bitwise_and(img2gray, img2gray, mask=mask)
ret, new_img = cv2.threshold(image_final, 180, 255, cv2.THRESH_BINARY)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3,3))
dilated = cv2.dilate(new_img, kernel, iterations=9)
contours, hierarchy = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
for contour in contours:
[x, y, w, h] = cv2.boundingRect(contour)
if w < 35 and h < 35:
continue
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 255), 2)
cv2.imshow('captcha_result', img)
cv2.waitKey()
file_name = 'my_image.jpg'
captch_ex(file_name)
However, how do I proceed further?