I'm trying to segment the below Burmese text image line by line (ultimately syllable by syllable, but one step at a time) by looking at the sum of black pixel in each row (Attached plot along with the code used to produce it). How can I go about segmenting text images as such using using sum of black pixels info?
Edit: Notice the lines are very close to each other so I'm having some trouble segmenting using cv2 dilation/erosion/findcontour. I have tried the method in Split text lines in scanned document too as @Miki suggested. In the text image, there are total of 6 lines of text. Possibly due to how burmese text characters look, I keep getting both over/under segmentation. Thus, I finally resorted to just making a decision based on sum of black pixels in each row.
import matplotlib.pyplot as plt
import cv2
image = cv2.imread('TextImage.PNG')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV)
cv2.imshow('binary',thresh)
cv2.waitKey(0)
thresh_sum = thresh.sum(axis=1)
plt.plot(thresh_sum)
Thanks!