0

I have an projection profile for an image. It looks like this enter image description here.

And the code looks like this

def smooth(y, box_pts):
    box = np.ones(box_pts)/box_pts
    y_smooth = np.convolve(y, box, mode='same')
    return y_smooth

img = cv2.imread(filename)

# RGB-GRAY Conversion
gray = rgb2gray(img) 

# Gray to Binary Conversion  Ostru Thresholding with Gaussian
blur = cv2.GaussianBlur(gray,(5,5),0)    
ret3,thres = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
#_,thres= cv2.threshold(gray,140,255,cv2.THRESH_BINARY)

#Counting black pixels per row
counts = np.sum(thres==0, axis=1)
row_number = [i for i in range(thres.shape[0])]

counts = smooth(counts,19)


plt.plot(row_number, counts, label='fit')
plt.xlabel('Row Number')
plt.ylabel('Number of Black Pixels')
plt.title('Horizontal Projection Profile')

I want to find row number for separating different lines. The desired row_number are [0, 227, 381, 547, 687]. I tried finding extrema or local minima using scipy.signal.argrelextrema() But I am also getting other (not desired) minima. So my question is - Is there any possible technique so that I can find these line separator row number?

P.S. - Setting a threshold does not seem to work because for new image projection profile might change so the threshold is.

Any help would be highly appreciated!!!

Himanshu Tiwari
  • 346
  • 1
  • 3
  • 9
  • Maybe this link will helps: [split-text-lines-in-scanned-document](https://stackoverflow.com/questions/34981144/split-text-lines-in-scanned-document/48268334#48268334) – Kinght 金 Jan 24 '18 at 07:24
  • [`find_peaks_cwt`](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.find_peaks_cwt.html) may be more robust than `argrelextrema` if you can make some assumptions about peak geometry. – MB-F Jan 24 '18 at 07:44
  • 2
    Compute 1-st and 2-nd derivatives and you'll find extremas. First derivative will be zero at each peak, second one will help to find if it max or min. – Andrey Smorodov Jan 24 '18 at 08:31
  • 3
    there is no function that will do this out of the box. you have to invest some own brain matter... detect all minima then filter them with rules you have to come up with. if a fixed threshold doesn't work use a dynamic one. unfortunately you did only provide a single example without stating what makes a minima desired and undesired... so how can you expect any help? – Piglet Jan 24 '18 at 09:11

1 Answers1

0

Assuming every line has got the same height, you could look for a subset of your set of minima. In your case another algorithm could answer a "row_number" like this one [80, 230, 380, 530, 680]. The model equation would be: row_number[n]=150*n+80=a*n+b. This other algorithm aim is to modulate a and b to minimize D the distance (square errors sum ?) with a subset of your minima. a and b are to find in sets you define. You could also use another parameter RM which is the ratio of rejected minima.

If you know the number of lines N, you can pick up every N-tuplet among your minima and try linear regressions, and keep -best- guesses.

Another idea for periodic phenomena is of course Fourier analysis, which would give you first parameter a (line height)

7Tonin
  • 69
  • 4