5

In using OpenCV for detection tasks I keep running into the problem of merging overlapping bounding boxes; that is, essentially finding the bounding box around the union of two overlapping bounding boxes. This comes up a lot in object detection when, for some reason, the object of interest gets fractured into a number of bounding boxes, rather than one all-encompassing one.

There are a few answers on StackOverflow for algorithmic solutions and helpful external libraries (e.g. this, this, this), and there's also the groupRectangles function provided by OpenCV (and the litany of associated questions/bugs: this, this, etc).

I seem to find that the aforementioned solutions are a bit overly complex for the task I'm trying to perform. A lot of the algorithmic solutions solve this problem from a mathematical perspective (more like a thought experiment), while doing operations like rect1 | rect2 can get really slow when the number of rectangles is high (it's O(N^2) to handle everything) and groupRectangles has some quirks that make it only partially effective. So I came up with a somewhat hackish solution that actually works pretty effectively. I thought I'd share it below for anyone else who needs a quick fix for this common problem.

Feel free to comment and critique it.

Community
  • 1
  • 1
marcman
  • 3,233
  • 4
  • 36
  • 71
  • test for intersection: cv::Rect intersect = rect1 && rect2; combine: if(intersection.width) combination = rect1 || rect2; – Micka Apr 08 '15 at 19:12
  • @Micka but that requires you to go through N rectangles N times each. For a small N that's fine. But empirically I've found that the quadratic running time ends up making that slower than my solution below when you get a larger N – marcman Apr 08 '15 at 19:16
  • thats obviously true! you can partition the space (similar to your answer) but that will be slow for smaller numbers and might use more memory. – Micka Apr 08 '15 at 19:29
  • 1
    @Micka Also feasible. Part of why I posted this though was because it's also fairly lightweight and simple to implement. Once you get into partitioning and whatnot things tend to snowball – marcman Apr 08 '15 at 19:31
  • @marcman I think that to make this post more valuable, you should specify better 1) What do you mean by merging 2 rectangles, especially in term of result: do you want only the intersection, or the sum of the areas, or the min x0 and y0 (top corne r coordinates) and max x1 and y1 (bottom corner coordinates)? 2) What is that you mean by "too complex"? Too complicated, too expensive in term of processing? 3) Why your special solution is more suitable? Does it depend on the kind of input data you have? If yes, please describe more in detail your input data. – Antonio Apr 08 '15 at 23:48
  • Regarding my comment of before, is one pixel overlap considered overlap? – Antonio Apr 08 '15 at 23:52
  • @Antonio one pixel is considered overlap, but you can account for that with that scale factor in the code. If you want to make 10 pixels the minimum overlaps just make the scaleFactor a Size(-5,-5) and both bounding boxes shrink by 5 pixels north/south and east/west (of course this is chessboard distance overlap, so diagonals are slightly different) – marcman Apr 09 '15 at 15:30
  • @marcman Thanks for the update! I was wondering about another case, by the way mentioned in your comment to your own answer: let's say at the beginning there's an isolated rectangle, no overlap with any of the other rectangles. Then, by merging 2 other rectangles, you obtain a rectangle that overlaps with the initially isolated rectangle. Should this rectangle merged with the other two? Because I think indeed that's a flaw with your solution, but reasonably you can do a second pass with another algorithm that doesn't draw the rectangles, it should be feasible at that point with less rectangles – Antonio Apr 09 '15 at 16:02
  • @Antonio you're right that is a flaw. I also see a second pass as the easiest fix, but as you pointed out, there are probably other, more efficient ways as well. – marcman Apr 09 '15 at 17:04

1 Answers1

2
void mergeOverlappingBoxes(std::vector<cv::Rect> &inputBoxes, cv::Mat &image, std::vector<cv::Rect> &outputBoxes)
{
    cv::Mat mask = cv::Mat::zeros(image.size(), CV_8UC1); // Mask of original image
    cv::Size scaleFactor(10,10); // To expand rectangles, i.e. increase sensitivity to nearby rectangles. Doesn't have to be (10,10)--can be anything
    for (int i = 0; i < inputBoxes.size(); i++)
    {
        cv::Rect box = inputBoxes.at(i) + scaleFactor;
        cv::rectangle(mask, box, cv::Scalar(255), CV_FILLED); // Draw filled bounding boxes on mask
    }

    std::vector<std::vector<cv::Point>> contours;
    // Find contours in mask
    // If bounding boxes overlap, they will be joined by this function call
    cv::findContours(mask, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
    for (int j = 0; j < contours.size(); j++)
    {
        outputBoxes.push_back(cv::boundingRect(contours.at(j)));
    }
}
marcman
  • 3,233
  • 4
  • 36
  • 71
  • One bug is that you can still end up with a bounding box inside a bigger one if the internal rectangle is disjoint from two others being merged. For example, if the TR, BR, BL quadrants in a 2D XY plane are filled, and there's a small object in the TL quadrant that's not connected. The result will be a rectangle around the entire plane, as well as a distinct one inside the TL quadrant. One fix is to call this function again, but then it's easy to start spiraling... – marcman Apr 08 '15 at 19:14
  • I may suggest a much faster and lighter way. First, group all associated boxes using their pairwise overlap (operator &). For each group of boxes, find the min and max for axes x and y. The results are same with yours but the method is not cpu heavy. – LovaBill Apr 09 '15 at 15:14
  • 1
    @William how do you group associated boxes efficiently? Assuming you all you have is a vector of bounding boxes, how can you do that faster than O(N^2) time? I mentioned in a comment above, I haven't done analysis on my solution, but empirically it runs faster than comparing every `Rect` to every other – marcman Apr 09 '15 at 15:25
  • I like what [C4 detector](https://sites.google.com/site/wujx2001/home/c4) does. Check function `void PostProcess()` in Pedestrian_ICRA.cpp. Unfortunately, he uses his own class and needs some extra work to use cv::Rect instead. – LovaBill Apr 09 '15 at 15:49
  • I do not understand the use of scaleFactor. I would expect a scale factor to be a multiplication term... Here you are only translating rectangles... – Antonio Apr 09 '15 at 16:07
  • 1
    @Antonio The list of operations on `rect` can be found [here](http://docs.opencv.org/modules/core/doc/basic_structures.html#rect). +/- a `cv::Point()` does translations. +/- a cv::Size() will enlarge or shrink a `rect`. I had to test it to be sure too though – marcman Apr 09 '15 at 16:59