I tried doing ocr of each individual contour using tesseract but not getting proper text out of it. Contour Identification is done properly by using Extracting text OpenCV. Please suggest.
-
Do image normalization and morphological operations to equalize the image and reduce noise from the image. Then if you follow contour process,it helps to give proper result. – Gowthaman Jul 20 '18 at 09:43
-
If possible, share the image for better understanding about your problem – Gowthaman Jul 20 '18 at 09:43
-
Thank you for the quick reply, yes I did morphological operations but not giving proper results though. I did erosion followed by dilation then located contours and extracted the contour blocks. After extracting these blocks, I'm processing the same for tesseract-ocr but not giving proper text for most of the blocks. I took image from the above link for reference. – user2789964 Jul 20 '18 at 11:54
2 Answers
You are not getting proper text from OCR because of bad image pre-processing. Try various image processing techniques to narrow down on a workable approach for your image. As you have asked under python, If you have a colour image,
Convert it into black and white image, to remove the colour noise.
img = cv2.imread('name_of_the_coloured_input_image',0)
Blur the image using blurring techniques of opencv (averaging, gaussian blurring, median blurring and bilateral filtering), this decreases various noises in the image. Please refer to this link and try out various techniques
Then use thresholding (simple, adaptive or otsu thresholding), which removes all the pixels which are less than a certain threshold. Please refer to this link and try out various techniques
Now, get contours and try using tesseract on the contours to get better results.
Note : Please remember that for tesseract to work, you should have the text in black against a white background.

- 277
- 2
- 8
-
Thanks for the answer. I have posted my steps in below answer, please check and let me know the improvement points. – user2789964 Jul 20 '18 at 13:42
Please check for the below function, tell me if anything is missing.
#gray out the image
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
cv2.imshow('gray', gray)
cv2.waitKey(0)
#image blurring
blur = cv2.blur(gray,(1,1))
cv2.imshow('Blur', blur)
cv2.waitKey(0)
#threshold & invert
ret, thresh = cv2.threshold(blur, 127, 255, cv2.THRESH_BINARY_INV)
thresh_copy = thresh.copy()
cv2.imshow("Threshold", thresh_copy)
cv2.waitKey(0)
#Erosion
kernel1 = np.ones((1,1), np.uint8)
img_erosion = cv2.erode(thresh, kernel1, iterations=1)
cv2.imshow("Erosion", img_erosion.copy())
cv2.waitKey(0)
#applying dilation
kernel = np.ones((6,10), np.uint8)
img_dilation = cv2.dilate(img_erosion.copy(), kernel, iterations=1)
cv2.imshow("Dilation", img_dilation)
cv2.waitKey(0)
#find contours
im2, ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
return ctrs

- 21
- 2
- 3
-
Try with `adaptive threshold` instead of `threshold` and `kernel1` structuring element can be replaced with `MORPH_RECT` of `Size(12, 3)` followed by `morphologyEx` can be used before `findContours` process. – Gowthaman Jul 26 '18 at 06:18
-
I tried using adaptive threshold as well, I'm able to find the exact contours, but while processing these extracted contours to tesserract I'm getting weird result out of it. – user2789964 Jul 27 '18 at 07:45
-
You can do horizontal and vertical line detection before contour process, that filter out irreverent objects from the image. See this link [https://docs.opencv.org/3.2.0/d1/dee/tutorial_moprh_lines_detection.html] – Gowthaman Jul 27 '18 at 09:18