Efficient way to get largest box sizes of similar elements from a 2D numpy array (or pandas dataframe)

Question

Basically I have a 2 dimensional numpy array filled with boolean values.

For example:

[[0,0,0,1],
 [0,0,0,1],
 [1,1,1,1],
 [1,1,1,1],
 [1,0,0,0],
 [1,0,1,1]]

I need to figure out where the largest rectangular boxes of zeroes exists. I say largest in the sense that I want all the boxes except the ones that are fully contained by another box.

As shown below, some information about each box is needed as well...

So the full output for the array above would have this data:

box #1: upper_left_vert = (0,0), x_size = 3, y_size = 2, sq_area = 6
box #2: upper_left_vert = (5,2), x_size = 3, y_size = 1, sq_area = 3
box #3: upper_left_vert = (5,2), x_size = 1, y_size = 2, sq_area = 2

The arrays I'm dealing with are big and there are a lot of them, so efficiency is important. If a pandas dataframe would be faster I'm open to that option.

I'm pretty sure I can figure out some really inefficient ways to do this, but it seems like a problem that would have some faster options.

Having never used it before (or even regular C/++), I would be hesitant unless it was very easy to integrate. That said if it's 2x faster than the next best option it might be a good opportunity for me to learn. — Andrew G, Dec 30 '13 at 04:46
Cython's about the same speed as C/++ when written well. // What should the output for `[[0 0 0] [0 0 1] [0 1 1] [1 1 1]]` be? — Veedrac, Dec 30 '13 at 04:54
The output for that would be: box #1: upper_left_vert = (0,0), x_size = 3, y_size = 1, sq_area = 3 box #2: upper_left_vert = (0,0), x_size = 1, y_size = 3, sq_area = 3 box #3: upper_left_vert = (0,0), x_size = 2, y_size = 2, sq_area = 4 — Andrew G, Dec 30 '13 at 05:14
You guys are really jumping the gun here, before talking about python vs cython vs C++, you really need to find the right algorithm to do this. The right algorithm written in python will likely be much faster than the wrong algorithm written in whatever. [The algorithm described here](http://www.drdobbs.com/database/the-maximal-rectangle-problem/184410529) might be what you're looking for, it's O(N*M) for an N by M array. I would implement this in python and see if it's fast enough for your needs before doing anything else. — Bi Rico, Dec 30 '13 at 05:50
@BiRico I asked because if Andrew G was not willing to use Cython I'd have to look at pure Numpy solutions for speed, which makes the task quite a lot harder. — Veedrac, Dec 30 '13 at 06:33
I don't agree that this is a duplicate as *all* rectangles that aren't contained in others are wanted; the linked question only wants the *one* of maximum *area*. // Nonetheless there exists an `O(nm)` algorithm that is too large to fit in the margins of this comment. — Veedrac, Dec 30 '13 at 08:27
@Veedrac apply answer from question then "1" the found rectangle and recurse. — Andy Hayden, Dec 30 '13 at 20:15
@AndyHayden: That doesn't work with the example in the OPs post, since rectanges _are_ allowed to overlap — Eric, Sep 26 '16 at 12:21

Efficient way to get largest box sizes of similar elements from a 2D numpy array (or pandas dataframe)

0 Answers0