3

I am working on this image as source:

src

Applying the next code...

import cv2
import numpy as np

mser = cv2.MSER_create()
img = cv2.imread('C:\\Users\\Link\\Desktop\\test2.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions, _ = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
cv2.polylines(vis, hulls, 1, (0, 255, 0))

mask = np.zeros((img.shape[0], img.shape[1], 1), dtype=np.uint8)
for contour in hulls:
    cv2.drawContours(mask, [contour], -1, (255, 255, 255), -1)

    text_only = cv2.bitwise_and(img, img, mask=mask)

cv2.imshow('img', vis)
cv2.waitKey(0)
cv2.imshow('img', mask)
cv2.waitKey(0)
cv2.imshow('img', text_only)
cv2.waitKey(0)

cv2.imwrite('C:\\Users\\Link\\Desktop\\test_o\\1.png', text_only)

...I am obtaining this as result (mask):

o1

The question is this:

how to merge into a single object the number 5 in the number series (157661546) as long as it is divided in the mask image ?

Thanks

lucians
  • 2,239
  • 5
  • 36
  • 64

1 Answers1

4

Have a look here, it seems like the exact answer.

Here instead there is my version of the above code fine tuned for text extraction (with masking too).

Below there is the original code from the previous article, "ported" to python 3, opencv 3, added mser and bounding boxes. The main difference with my version is how the grouping distance is defined: mine is text-oriented while the one below is a free geometrical distance.

import sys
import cv2
import numpy as np

def find_if_close(cnt1,cnt2):
    row1,row2 = cnt1.shape[0],cnt2.shape[0]
    for i in range(row1):
        for j in range(row2):
            dist = np.linalg.norm(cnt1[i]-cnt2[j])
            if abs(dist) < 25:      # <-- threshold
                return True
            elif i==row1-1 and j==row2-1:
                return False

img = cv2.imread(sys.argv[1])
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

cv2.imshow('input', img)

ret,thresh = cv2.threshold(gray,127,255,0)

mser=False
if mser:
    mser = cv2.MSER_create()
    regions = mser.detectRegions(thresh)
    hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
    contours = hulls
else:
    thresh = cv2.bitwise_not(thresh) # wants black bg
    im2,contours,hier = cv2.findContours(thresh,cv2.RETR_EXTERNAL,2)

cv2.drawContours(img, contours, -1, (0,0,255), 1)
cv2.imshow('base contours', img)


LENGTH = len(contours)
status = np.zeros((LENGTH,1))

print("Elements:", len(contours))
for i,cnt1 in enumerate(contours):
    x = i    
    if i != LENGTH-1:
        for j,cnt2 in enumerate(contours[i+1:]):
            x = x+1
            dist = find_if_close(cnt1,cnt2)
            if dist == True:
                val = min(status[i],status[x])
                status[x] = status[i] = val
            else:
                if status[x]==status[i]:
                    status[x] = i+1

unified = []
maximum = int(status.max())+1
for i in range(maximum):
    pos = np.where(status==i)[0]
    if pos.size != 0:
        cont = np.vstack(contours[i] for i in pos)
        hull = cv2.convexHull(cont)
        unified.append(hull)

cv2.drawContours(img,contours,-1,(0,0,255),1)
cv2.drawContours(img,unified,-1,(0,255,0),2)
#cv2.drawContours(thresh,unified,-1,255,-1)

for c in unified:
    (x,y,w,h) = cv2.boundingRect(c)
    cv2.rectangle(img, (x,y), (x+w,y+h), (255, 0, 0), 2)

cv2.imshow('result', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Sample output (the yellow blob is below the binary threshold conversion so it's ignored). Red: original contours, green: unified ones, blue: bounding boxes.

Sample output

Probably there is no need to use MSER as a simple findContours may work fine.

------------------------

Starting from here there is my old answer, before I found the above code. I'm leaving it anyway as it describe a couple of different approaches that may be easier/more appropriate for some scenarios.

A quick and dirty trick is to add a small gaussian blur and a high threshold before the MSER (or some dilute/erode if you prefer fancy things). In practice you just make the text bolder so that it fills small gaps. Obviously you can later discard this version and crop from the original one.

Otherwise, if your text is in lines, you may try to detect the average line center (make an histogram of Y coordinates and find the peaks for example). Then, for each line, look for fragments with a close average X. Quite fragile if text is noisy/complex.

If you do not need to split each letter, getting the bounding box for the whole word, may be easier: just split in groups based on a maximum horizontal distance between fragments (using the leftmost/rightmost points of the contour). Then use the leftmost and rightmost boxes within each group to find the whole bounding box. For multiline text first group by centroids Y coordinate.

Implementation notes:

Opencv allows you to create histograms but you probably can get away with something like this (worked for me on a similar task):

def histogram(vals, th=4, bins=400):
    hist = np.zeros(bins)
    for y_center in vals:
        bucket = int(round(y_center / 2.)) <-- change this "2."
        hist[bucket-1] += 1
    print("hist: ", hist)

    hist = np.where(hist > th, hist, 0)
    return hist

Here my histogram is just an array with 400 buckets (my image was 800px high so each bucket catches two pixels, that is where the "2." comes from). Vals are the Y coordinates of the centroids of each fragment (you may want to ignore very small elements when you build this list). The th threshold is there just to remove some noise. You should get something like this:

0,0,0,5,22,0,0,0,0,43,7,0,0,0

This list describes, moving top to bottom, how many fragments are at each location.

Now I ran another pass to merge the peaks into a single value (just scan the array and sum while it is non-zero and reset the count on first zero) getting something like this {y:count}:

{9:27, 20:50}

Now I know I have two text rows at y=9 and y=20. Now, or before, you assign each fragment to on line (with again an 8px threshold in my case). Now you can process each line on its own, finding "words". BTW, I have your identical problem with broken letters that's why I came here looking for MSER :). Notice that if you find the whole bounding box for the word this problem happens only on the first/last letters: the other broken letters just falls inside the word box anyway.

Here is a reference for the erode/dilate thing, but gaussian blur/th worked for me.

UPDATE: I've noticed that there is something wrong in this line:

regions = mser.detectRegions(thresh)

I pass in the already thresholded image(!?). This is not relevant for the aggregation part but keep in mind that the mser part is not being used as expected.

lorenzo
  • 556
  • 5
  • 13
  • Checking right now what is histogram method. You say many keywords in your answer that I will have to study about. BTW, thanks. – lucians Dec 06 '17 at 10:35
  • Added a few extra details. – lorenzo Dec 06 '17 at 12:12
  • Maybe I found what WE are looking for: https://dsp.stackexchange.com/questions/2564/opencv-c-connect-nearby-contours-based-on-distance-between-them – lorenzo Dec 06 '17 at 12:30
  • Ok, last update I suppose. BTW, thanks for your question, working on the answer I found the solution I was looking for. – lorenzo Dec 06 '17 at 13:18
  • Thanks for the long and explicit answer. Most of what you said I still have to learn. Here is a [first result](https://imgur.com/a/N8D6c). As you can see, it's "strange" compared to my [first try](https://stackoverflow.com/questions/47595684/extract-mser-detected-areas-python-opencv/47596008#47596008). As you said, now I am working on this: blur the original image and trying to find the contours of single digits. After that, for each contour, draw a MSER line around it and save it to a ROI (not a box, it's the shape that I need as long as boxes are not so accurate..). BTW, that's the answer – lucians Dec 06 '17 at 14:24
  • do you know how to extract a single image shape (not with bounding box, but shape) as I asked [here](https://stackoverflow.com/questions/47632535/save-shape-of-mser-python-opencv?noredirect=1#comment82226600_47632535)? – lucians Dec 06 '17 at 15:14
  • 1
    By default the above code does not use mser, may be that? I think there is also some problem with the above code as I found distant shapes merged randomly. I'll update the answer as I dicover more. – lorenzo Dec 06 '17 at 16:08
  • Answer updated with my version of the above code (on github). – lorenzo Dec 07 '17 at 17:18
  • My goal was to extract shape of chars instead of bounding box because of a more clean image. The code you posted above is ok but help just a little on extraction. Anyway, I found a simple (raw) solution to have my digits extracted and, let's say, MNIST ready; polish single ROI after extraction. It seems to work for now..Also, gaussian and threshold helped me understand other things abput opencv, so thanks. I will make an article on my blog with a full tutorial on how to do a (I hope) good ROI extraction. – lucians Dec 09 '17 at 00:10