Naive way to find largest block in a rectangle of 1's and 0's

Question

I'm trying to come up with the bruteforce(naive) solution to find the largest block of 1 or 0 in a rectangle of 1 and 0. I know optimal ways which can do it in O(n) time where n is the total size of rectangle.

1 1 0 1 0 1
1 0 0 0 1 1
1 0 0 0 1 1
1 1 0 1 1 0

In the above rectangle, it is rectangle starting at (Row 2, Col 2) of size 6. I was thinking this..

Go through each element and then find the size it makes by iterating in all directions from it.

Is it bruteforce? What will be the complexity? I'm going through all elements that is n, but then I'm iterating in all directions, how much will that be?

I'm aware that this question has been asked 100 times, but they talk about optimal solutions. What I'm looking for is a bruteforce solution and its complexity?

You don't need to go in all directions, only right and down (I believe that would still be considered brute force, just a bit faster). — Bernhard Barker, Oct 26 '13 at 18:26
"Go through each element" - in which order? The performance depends on that. Choosing the elements in a sort of a sparse grid is better than raster ordering, while both may qualify for being "brute force". — anatolyg, Oct 26 '13 at 19:20
@OP The algorithm you describe actually iterates over sub-rectangles multiple times. It's possibly even worse than brute force since it considers the same exact candidate multiple times. :) — Shashank, Oct 26 '13 at 20:16
If iterating only right and down from each element in the rectangle and abandoning a sub-rectangle as soon as you find that it has a mixture of 0s and 1s, you are using backtracking which is probably much more efficient than brute force (though worst-case it could still be O(N^3)). — Shashank, Oct 26 '13 at 21:26

Nico Schertler · Answer 1 · 2013-10-26T22:02:14.920

The algorithm you described looks somehow like this C code:

//for each entry
for(int x = 0; x < width; ++x)
    for(int y = 0; y < height; ++y)
    {
        char lookFor = entries[x][y];
        int area = 0;
        for(int row = y; row < height; ++row)
        {
            if(entries[x, row] != lookFor)
                break;
            for(int col = x; col < width; ++col)
            {
                if(entries[col, row] != lookFor)
                    break;
                int currentArea = (col - x + 1) * (row - y + 1);
                if(currentArea > area)
                {
                    //save the current rect
                }
            }
        }
    }

There are four nested loops. The outer loops will iterate exactly n times (with n being the number of entries). The inner loops will iterate width * f1 and height * f2 times in average (with f1 and f2 being some constant fraction). The rest of the algorithm takes constant time and does not depend on the problem size.

Therefore, the complexity is O(n * f1 * width * f2 * height) = O(n^2), which essentially means "go to each entry and from there, visit each entry again", regardless of whether all entries really need to be checked or just a constant fraction that increases with the problem size.

Edit

The above explanations assume that the entries are not distributed randomly and that for larger fields it is more likely to find larger homogeneous subregions. If this is not the case and the average iteration count for the inner loops does not depend on the field size at all (e.g. for randomly distributed entries), then the resulting time complexity is O(n)

This solution will take O(N^4) time, inner loops are not constant — hrv, Oct 26 '13 at 18:34
@hrv: No, it's not `n^4`. `n` is the number of entries. Each for-loop iterates for approximately `factor * sqrt(n)` times, resulting in the overall complexity of `O(n^2)`. Although the inner loops are not constant, we can assume that in average their size is a fraction of the original problem size. Best case and worst case complexities are different from that. — Nico Schertler, Oct 26 '13 at 21:47
Both of you are incorrect. First of all, that C code is not like the algorithm described in the OP's question. The algorithm described by the OP is closer to brute force flood fill. The C code you've written seems to be leaning towards backtracking from the top-left corner of each sub-rectangle (though the C code is a bit incomplete from what I see). Also hrv is wrong because he's forgetting what "N" is, the total size of the rectangle AKA height*width. :) That is why the C code described is actually O(N^2) as Nico says, because he means N in terms of the "area" of the rectangle. — Shashank, Oct 26 '13 at 22:19

Shashank · Answer 2 · 2013-10-26T19:00:45.317

Brute force is generally split into two (sometimes sequential) parts. The first part is generating all possible candidates for solutions to the problem. The second part is testing them to see if they actually are solutions.

Brute force: Assume your rectangle is m x n. Generate all sub-rectangles of size a x b where a is in {1..m} and b is in {1..n}. Set a maximum variable to 0. Test all sub-rectangles to see if it is all 0s and 1s. If it is, let maximum = max(maximum, size(sub-rectangle). Alternatively simply start by testing the larger sub-rectangles and move towards testing smaller sub-rectangles. As soon as you find a sub-rectangle with all 0s or 1s, stop. The time complexity will be the same because in the worst-case for both methods, you will still have to iterate through all sub-rectangles.

Time complexity:

Let's count the number of sub-rectangles generated at each step.

There are m*n subrectangles of size 1 x 1.

There are (m-1)*n subrectangles of size 2 x 1.

There are m*(n-1) subrectangles of size 1 x 2.

There are (m-1)*(n-1) subrectangles of size 2 x 2.

... < and so forth >

There are (m-(m-1))*(n-(n-1)) subrectangles of size m x n.

Thus the formula for counting the number of subrectangles of size a x b from a larger rectangle of size m x n is simply:

number_of_subrectangles_of_size_a_b = (m-a) * (m-b)

If we imagine these numbers laid out in an arithmetic series we get

1*1 + 1*2 + ... + 1*n + 2*1 + ... + m*n

This can be factored to:

(1 + 2 + ... + m) * (1 + 2 + ... + n).

These two arithmetic series converge to the order of O(m²) and O(n²) respectively. Thus generating all sub-rectangles of an m*n rectangle is O(m²n²). Now we look at the testing phase.

After generating all sub-rectangles, testing if each sub-rectangle of size a x b is all 0s or all 1s is O(a * b). Unlike the previous step of generating sub-rectangles of size a x b which scales upwards as a x b decreases, this step increases proportionally with the size of a x b.

e.g.: There is 1 sub-rectangle of size m x n. But testing to see if that rectangle is all 0s or all 1s takes O(m*n) time. Conversely there are m*n sub-rectangles of size 1. But testing to see if those rectangles are all 0s or all 1s takes only O(1) time per rectangle.

What you finally end up for the time complexity is a series like this:

O( (m-(m-1))(n-(n-1))*(mn) + (m-(m-1))(n-(n-2))*m(n-1)... + mn*(m-(m-1))(n-(n-1)) )

Note 2 things here.

The largest term in the series is going to be somewhere close to (m / 2)(n / 2)(m / 2) (n / 2) which is O(m²n²)
There are m * n terms total in the series.

Thus the testing phase of the brute-force solution will be O(mn * m²n²) = O(m³n³).

Total time complexity is:

O(generating) + O(testing) = O(m²n² + m³n³) = O(m³n³)

If the area of the given rectangle is say, N, we will have O(N³) time complexity.

Thanks for such a detailed answer! – questions Oct 27 '13 at 03:15 — questions, Oct 27 '13 at 03:15

score 0 · Answer 3 · answered Oct 26 '13 at 18:57

Look into "connected components" algorithms for additional ideas. What you've presented as a two-dimensional array of binary values looks just like a binary black & white image. An important exception is that in image processing we typically allow a connected component (a blob of 0s or 1s) to have non-rectangular shapes. Some tweaks to the existing multi-pass and single-pass algorithms should be easy to implement.

http://en.wikipedia.org/wiki/Connected-component_labeling

Although it's a more general solution than you need, you could also run a connected components algorithm to find all connected regions (0s or 1s, background or foreground) and then filter the resulting components (a.k.a. blobs). I'll also mention that for foreground components it's preferable to select for "4-connectivity" rather than "8-connectivity," where the former means connectivity is allowed only at pixels above, below, left, and right of a center pixel, and the latter means connectivity is allowed for any of the eight pixels surrounding a center pixel.

A bit farther afield, for very large 2D arrays it may (just may) help to first reduce the search space by creating what we'd call an "image pyramid," meaning a stack of arrays of progressively smaller size: 1/2 each dimension (filled, as needed), 1/4, 1/8, and so on. A rectangle detectable in a reduced resolution image is a good candidate for being the largest rectangle in a very large image (or 2D array of bits). Although that may not be the best solution for whatever cases you're considering, it's scalable. Some effort would be required, naturally, to determine the cost of scaling the array/image versus the cost of maintaining relatively larger lists of candidate rectangles in the original, large image.

Run-length encoding may help you, too, especially since you're dealing with rectangles instead of connected components of arbitrary shape. Run-length encoding would express each row as stretches or "run lengths" of 0s or 1s. This technique was used to speed up connected component algorithms a decade or two ago, and it's still a reasonable approach.

Anyway, that's not the most direct answer to your question, but I hope it helps somewhat.

If you can limit the size of the rectangles, there's a wicked cool way to render this O(small), I think, and/or a nice way to parallelize it. By "O(small)" I mean there may be some theoretically precise way to determine the time, but the method allows for some interestingly large arrays, and I haven't bothered to think through all the constraints. The method can be a little tricky to implement, but it's so brute-forcey it's hilarious. — Rethunk, Oct 28 '13 at 04:03

Naive way to find largest block in a rectangle of 1's and 0's

3 Answers3

Edit

Linked