4

I am working on reading container code using an IP Camera and I came across @dhanushka's gradient based method for text detection and I've been successful with it as you can see below...

bool debugging = true;

Mat rgb = imread("/home/brian/qt/ANPR/images/0.jpg", 0);
if(debugging) { imshow("Original", rgb); }

Mat grad;
Mat morphKernel = getStructuringElement(MORPH_ELLIPSE, Size(3, 3));
morphologyEx(rgb, grad, MORPH_GRADIENT, morphKernel);
if(debugging) { imshow("gradient morph", grad); }

// binarize
Mat bw;
threshold(grad, bw, 0.0, 255.0, THRESH_BINARY | THRESH_OTSU);
if(debugging) { imshow("threshold", bw); }

// connect horizontally oriented regions
Mat connected;
morphKernel = getStructuringElement(MORPH_RECT, Size(10, 1));
morphologyEx(bw, connected, MORPH_CLOSE, morphKernel);
if(debugging) { imshow("horizontal regions morph", connected); }

// find contours
Mat mask = Mat::zeros(bw.size(), CV_8UC1);
vector<vector<Point> > contours2;
vector<Vec4i> hierarchy;
vector<Rect> txtRect;
vector<vector<Point> > txtContour;
findContours(connected, contours2, hierarchy, CV_RETR_CCOMP, 
                    CV_CHAIN_APPROX_SIMPLE, Point(0, 0));

// filter contours
for(int i = 0; i >= 0; i = hierarchy[i][0]) {
    Rect rect = boundingRect(contours2[i]);
    Mat maskROI(mask, rect);
    maskROI = Scalar(0, 0, 0);

    // fill the contour
    drawContours(mask, contours2, i, Scalar(255, 255, 255), CV_FILLED);

    // ratio of non-zero pixels in the filled region
    double r = (double)countNonZero(maskROI)/(rect.width*rect.height);

    /* assume at least 45% of the area is filled if it contains text */
    if (r > .45 && (rect.height > 10 && rect.width > 10)) {
        rectangle(rgb, rect, Scalar(0, 255, 0), 2);
        txtRect.push_back(rect);
        txtContour.push_back(contours2[i]);
    }

}
if(debugging) { imshow("Characters", rgb); }

Mat text(rgb.size(), CV_8U, Scalar(255));
drawContours(text, txtContour, -1, Scalar(0), FILLED, 4);
if(debugging) { imshow("Detected Text", text); }

Step 0: Original Step 0: Original Image

Step 1: Morphological gradient Step 1: Morphological gradient

Step 2: Binarize Step 2: Binarized image

Step 3: Horizontally oriented regions Step 3: Horizontally oriented regions

Step 4: Detected Text Step 4: Detected Text

Step 5: Extracted Text Step 5: Extracted Text

The problem is that I have failed to properly extract the detected text so that I can use it in OCR to get the result BSIU225378.

The text I managed to extract is coming from the horizontal connected regions and its unusable for orc, is there a way I can extract text say from the binarized (threshold) image using the contours I found in the horizontal connected regions?

Community
  • 1
  • 1
bmatovu
  • 3,756
  • 1
  • 35
  • 37
  • 1
    Hi, could you add small snippets which are used to transition from one image to another, so SO users have clear idea of what is being done. Also, it is not clear what you want to achieve. Please elaborate more. – saurabheights Mar 12 '17 at 14:02
  • Taking a sub-rectangle of a larger image is a very simple operation, but if not done right, even the most sophisticated image detection algorithms will fall over. – Malcolm McLean Mar 12 '17 at 16:44
  • @saurabheights I have elaborated more on the question and added the code used... – bmatovu Mar 12 '17 at 18:13
  • 1
    Hi, I have a minor question: What do you mean by fail? Also, can you change the title of each image such as Step 0: Original Image to Original(as in code) to make output match code, will be clearer for other users. P.S. In case by fail, you mean the text in final image is gibberish black and cannot be used for OCR, the solution is to use your final image to create inverse binary mask(use inverse thresholding) and multiply with the original image to get the output image. You can also crop the output image using the rect parameterized constructor of Mat. – saurabheights Mar 12 '17 at 18:57
  • Not clear what your problem is. If you are unable to group the OCR results of your regions of interest to obtain the final result, consider grouping those regions by their bounding-box upper-left y-coordinate and height, because those features look very similar for those regions. One way of doing this is using k-means to cluster `(upper-left y, height)` pairs of the bounding rectangles that you have found. – dhanushka Mar 13 '17 at 16:35
  • @bmatovu [Stroke Width Transform](https://sites.google.com/site/roboticssaurav/strokewidthnokia) could be considered instead or in combination. [Here](https://github.com/aperrau/DetectText) one OpenCV implementation (not tested). – Catree Mar 14 '17 at 19:01

0 Answers0