1

I asked a similar question here but that is focused more on tesseract.

I have a sample image as below. I would like to make the white square my Region of Interest and then crop out that part (square) and create a new image with it. I will be working with different images so the square won't always be at the same location in all images. So I will need to somehow detect the edges of the square.

enter image description here

What are some pre-processing methods I can perform to achieve the result?

Community
  • 1
  • 1
birdy
  • 9,286
  • 24
  • 107
  • 171

3 Answers3

11

Using your test image I was able to remove all the noises with a simple erosion operation.

After that, a simple iteration on the Mat to find for the corner pixels is trivial, and I talked about that on this answer. For testing purposes we can draw green lines between those points to display the area we are interested at in the original image:

At the end, I set the ROI in the original image and crop out that part.

The final result is displayed on the image below:

I wrote a sample code that performs this task using the C++ interface of OpenCV. I'm confident in your skills to translate this code to Python. If you can't do it, forget the code and stick with the roadmap I shared on this answer.

#include <cv.h>
#include <highgui.h>

int main(int argc, char* argv[])
{
    cv::Mat img = cv::imread(argv[1]);
    std::cout << "Original image size: " << img.size() << std::endl;

    // Convert RGB Mat to GRAY
    cv::Mat gray;
    cv::cvtColor(img, gray, CV_BGR2GRAY);
    std::cout << "Gray image size: " << gray.size() << std::endl;

    // Erode image to remove unwanted noises
    int erosion_size = 5;
    cv::Mat element = cv::getStructuringElement(cv::MORPH_CROSS,
                                       cv::Size(2 * erosion_size + 1, 2 * erosion_size + 1),
                                       cv::Point(erosion_size, erosion_size) );
    cv::erode(gray, gray, element);

    // Scan the image searching for points and store them in a vector
    std::vector<cv::Point> points;
    cv::Mat_<uchar>::iterator it = gray.begin<uchar>();
    cv::Mat_<uchar>::iterator end = gray.end<uchar>();
    for (; it != end; it++)
    {
        if (*it) 
            points.push_back(it.pos()); 
    }

    // From the points, figure out the size of the ROI
    int left, right, top, bottom;
    for (int i = 0; i < points.size(); i++)
    {
        if (i == 0) // initialize corner values
        {
            left = right = points[i].x;
            top = bottom = points[i].y;
        }

        if (points[i].x < left)
            left = points[i].x;

        if (points[i].x > right)
            right = points[i].x;

        if (points[i].y < top)
            top = points[i].y;

        if (points[i].y > bottom)
            bottom = points[i].y;
    }
    std::vector<cv::Point> box_points;
    box_points.push_back(cv::Point(left, top));
    box_points.push_back(cv::Point(left, bottom));
    box_points.push_back(cv::Point(right, bottom));
    box_points.push_back(cv::Point(right, top));

    // Compute minimal bounding box for the ROI
    // Note: for some unknown reason, width/height of the box are switched.
    cv::RotatedRect box = cv::minAreaRect(cv::Mat(box_points));
    std::cout << "box w:" << box.size.width << " h:" << box.size.height << std::endl;

    // Draw bounding box in the original image (debugging purposes)
    //cv::Point2f vertices[4];
    //box.points(vertices);
    //for (int i = 0; i < 4; ++i)
    //{
    //    cv::line(img, vertices[i], vertices[(i + 1) % 4], cv::Scalar(0, 255, 0), 1, CV_AA);
    //}
    //cv::imshow("Original", img);
    //cv::waitKey(0);

    // Set the ROI to the area defined by the box
    // Note: because the width/height of the box are switched, 
    // they were switched manually in the code below:
    cv::Rect roi;
    roi.x = box.center.x - (box.size.height / 2);
    roi.y = box.center.y - (box.size.width / 2);
    roi.width = box.size.height;
    roi.height = box.size.width;
    std::cout << "roi @ " << roi.x << "," << roi.y << " " << roi.width << "x" << roi.height << std::endl;

    // Crop the original image to the defined ROI
    cv::Mat crop = img(roi);

    // Display cropped ROI
    cv::imshow("Cropped ROI", crop);
    cv::waitKey(0);

    return 0;
}
Community
  • 1
  • 1
karlphillip
  • 92,053
  • 36
  • 243
  • 426
  • Thanks for the response. I am currently trying to convert the C code you provided to JavaCV :). Thanks again. – birdy Mar 29 '13 at 18:53
  • No need to thank me, just up vote the answer. Consider clicking on the checkbox near it to select it as the official answer. By doing these things you will be helping future visitors. – karlphillip Mar 29 '13 at 19:01
  • what does the line `if (*it)` do/stand for? – birdy Mar 31 '13 at 21:11
  • I am really confused. I put a if/else block on that line of code. i.e. `if (*it) { std::cout << "if" << std::endl; std::cout << it.pos() << std::endl; } else { std::cout << "else" << std::endl; std::cout << it.pos() << std::endl; }` but for example I get the following in the output `if [411, 280] else [412, 280]`. What is the difference in the two? points are not 0 in either of them.. – birdy Mar 31 '13 at 23:07
  • You are printing the pixel coordinates, and I'm talking about the color of the pixel. – karlphillip Apr 01 '13 at 00:36
  • Maybe I'm confused because the `Mat` class in Java bindings have no such method that returns a byte that representing a pixels color – birdy Apr 01 '13 at 02:29
  • Perfect for some images. But some images give me an error like that: Assertion failed (0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows) in Mat – Can Ürek Aug 15 '14 at 22:43
  • That means that you are passing weird values for a ROI. This would happen if you are setting a ROI bigger than the image. – karlphillip Aug 15 '14 at 23:53
4

Seeing that the text is the only large blob, and everything else is barely larger than a pixel, a simple morphological opening should suffice

You can do this in opencv or with imagemagic

Afterwards the white rectangle should be the only thing left in the image. You can find it with opencvs findcontours, with the CvBlobs library for opencv or with the imagemagick -crop function

Here is your image with 2 steps of erosion followed by 2 steps of dilation applied: enter image description here You can simply plug this image into the opencv findContours function as in the Squares tutorial example to get the position

HugoRune
  • 13,157
  • 7
  • 69
  • 144
  • This will work for detecting the square, however, the resulting image you provided blurs the text. I would like the text inside the square to remain as is because finally i would like to feed that text to an OCR. Please let me know if there are better ways to achieve what I'm trying to do – birdy Mar 29 '13 at 00:28
  • When you have got rectangle (even not filled) it's quite easy - use findContours function(http://docs.opencv.org/doc/tutorials/imgproc/shapedescriptors/find_contours/find_contours.html) to find the contour of rectangle(probably you will find more than one contour - just take the biggest one) and then fill it with white color. You will have filled rectangle, so now just use bitwise and(http://docs.opencv.org/modules/core/doc/operations_on_arrays.html#bitwise-and) on this image and on original picture. – cyriel Mar 29 '13 at 02:11
0

input

#objective:
#1)compress large images to less than 1000x1000
#2)identify region of interests
#3)save rois in top to bottom order
import cv2
import os

def get_contour_precedence(contour, cols):
    tolerance_factor = 10
    origin = cv2.boundingRect(contour)
    return ((origin[1] // tolerance_factor) * tolerance_factor) * cols + origin[0]

# Load image, grayscale, Gaussian blur, adaptive threshold
image = cv2.imread('./images/sample_0.jpg')

#compress the image if image size is >than 1000x1000
height, width, color = image.shape #unpacking tuple (height, width, colour) returned by image.shape
while(width > 1000):
    height = height/2
    width = width/2
print(int(height), int(width))
height = int(height)
width = int(width)
image = cv2.resize(image, (width, height))

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (9,9), 0)
thresh = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,30)
# Dilate to combine adjacent text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
ret,thresh3 = cv2.threshold(image,127,255,cv2.THRESH_BINARY_INV)
dilate = cv2.dilate(thresh, kernel, iterations=4)

# Find contours, highlight text areas, and extract ROIs
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
#cnts = cv2.findContours(thresh3, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

cnts = cnts[0] if len(cnts) == 2 else cnts[1]

#ORDER CONTOURS top to bottom
cnts.sort(key=lambda x:get_contour_precedence(x, image.shape[1]))

#delete previous roi images in folder roi to avoid
dir = './roi/'
for f in os.listdir(dir):
    os.remove(os.path.join(dir, f))

ROI_number = 0
for c in cnts:
    area = cv2.contourArea(c)
    if area > 10000:
        x,y,w,h = cv2.boundingRect(c)
        #cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
        cv2.rectangle(image, (x, y), (x + w, y + h), (100,100,100), 1)
        #use below code to write roi when results are good
        ROI = image[y:y+h, x:x+w]
        cv2.imwrite('roi/ROI_{}.jpg'.format(ROI_number), ROI)
        ROI_number += 1

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()

roi detection output

HarshaBlaze
  • 1
  • 1
  • 2