6

I need an algorithm which can parse a 2D array and return the largest continuous rectangle. For reference, look at the image I made demonstrating my question.

enter image description here

Asha
  • 11,002
  • 6
  • 44
  • 66
Johnathan
  • 787
  • 4
  • 10
  • 21

4 Answers4

11

Generally you solve these sorts of problems using what are called scan line algorithms. They examine the data one row (or scan line) at a time building up the answer you are looking for, in your case candidate rectangles.

Here's a rough outline of how it would work.

Number all the rows in your image from 0..6, I'll work from the bottom up.

Examining row 0 you have the beginnings of two rectangles (I am assuming you are only interested in the black square). I'll refer to rectangles using (x, y, width, height). The two active rectangles are (1,0,2,1) and (4,0,6,1). You add these to a list of active rectangles. This list is sorted by increasing x coordinate.

You are now done with scan line 0, so you increment your scan line.

Examining row 1 you work along the row seeing if you have any of the following:

  • new active rectangles
  • space for existing rectangles to grow
  • obstacles which split existing rectangles
  • obstacles which require you to remove a rectangle from the active list

As you work along the row you will see that you have a new active rect (0,1,8,1), we can grow one of existing active ones to (1,0,2,2) and we need to remove the active (4,0,6,1) replacing it with two narrower ones. We need to remember this one. It is the largest we have seen to far. It is replaced with two new active ones: (4,0,4,2) and (9,0,1,2)

So at the send of scan line 1 we have:

  • Active List: (0,1,8,1), (1,0,2,2), (4,0,4,2), (9, 0, 1, 2)
  • Biggest so far: (4,0,6,1)

You continue in this manner until you run out of scan lines.

The tricky part is coding up the routine that runs along the scan line updating the active list. If you do it correctly you will consider each pixel only once.

Hope this helps. It is a little tricky to describe.

idz
  • 12,825
  • 1
  • 29
  • 40
  • This is the only suggested answer with a correct algorithm. All the others take shortcuts that may eliminate the optimal result. – Ben Voigt May 09 '11 at 03:01
  • @Ben thanks. It is also very efficient O(n log n) where n is the number of pixels. – idz May 09 '11 at 03:13
  • I have to agree with Ben. idz's answer is preferred. – Perry Horwich May 09 '11 at 03:40
  • @Perry, thank you. As you were working through your ideas though you were getting close. Your LINES array concept is close. Please don't take this as being condescending; I did not come up this answer on the spot. I have done extensive professional work that involves algorithms just like this. So it's pretty impressive to see someone working through the problem! – idz May 09 '11 at 03:46
  • Awesome, this is exactly what I was looking for. Now I will be able to make my game more efficient. Thanks a bunch guys. – Johnathan May 09 '11 at 04:02
  • @user422382 You still have some coding to do. It's quite tricky to make sure you get it all right. Hopefully the fact that I gave you the 4 cases you will run into will help you out. These sort of problems are a lot of fun... Good luck! – idz May 09 '11 at 04:06
  • Thx idz. np. I also wondered about projecting successively decreasing squares over each point in ARRAY to generate a kind of contour map the contents of which reflect which points are located in the largest spaces, then using these local 'peaks' as selected starting seed points for region growing. But... I suspect this could miss long skinny rectangles in a map with lots of 'diagonal' spaces. It also would be 'slow' in a map with one big space. – Perry Horwich May 09 '11 at 04:07
  • Or... what about changing the 2d array into one long line. Any rectangle should be evident as a localized repeating pattern over distance. Would a Fourier transform find it? – Perry Horwich May 09 '11 at 04:24
  • @Perry spectral algorithms (FFT, cepstrum and the like) do seem like they might work on the face of it, but it turns out that they don't help much. Too see why think about what the FFT of a square wave looks like (loads of harmonics right) now image a sum of a bunch of them. Bit tricky to work from there back to rectangles. (Also your FFT has cost you O(n log n) so you are not winning much). – idz May 09 '11 at 04:32
  • This is the most clear, simple and efficient answer for 'largest rectangle' problems I've ever seen. Thank you idz – Mickey Shine Jul 22 '13 at 09:14
5

I like a region growing approach for this.

  • For each open point in ARRAY
  • grow EAST as far as possible
  • grow WEST as far as possible
  • grow NORTH as far as possible by adding rows
  • grow SOUTH as far as possible by adding rows
  • save the resulting area for the seed pixel used
  • After looping through each point in ARRAY, pick the seed pixel with the largest area result

...would be a thorough, but maybe not-the-most-efficient way to go about it.

I suppose you need to answer the philosophical question "Is a line of points a skinny rectangle?" If a line == a thin rectangle, you could optimize further by:

  • Create a second array of integers called LINES that has the same dimensions as ARRAY
  • Loop through each point in ARRAY
  • Determine the longest valid line to the EAST that begins at each point and save its length in the corresponding cell of LINES.
  • After doing this for each point in ARRAY, loop through LINES
  • For each point in LINES, determine how many neighbors SOUTH have the same length value or less.
  • Accept a SOUTHERN neighbor with a smaller length if doing so will increase the area of the rectangle.
  • The largest rectangle using that seed point is (Number_of_acceptable_southern_neighbors*the_length_of_longest_accepted_line)
  • As the largest rectangular area for each seed is calculated, check to see if you have a new max value and save the result if you do.
  • And... you could do this without allocating an array LINES, but I thought using it in my explanation made the description simpler.
  • And... I think you need to do this same sort of thing with VERTICAL_LINES and EASTERN_NEIGHBORS, or some cases might miss big rectangles that are tall and skinny. So maybe this second algorithm isn't so optimized after all.

Use the first method to check your work. I think Knuth said "...premature optimization is the root of all evil."

HTH,

Perry


ADDENDUM:Several edits later, I think this answer deserves a group upvote.

Perry Horwich
  • 2,798
  • 3
  • 23
  • 51
  • Why not just grow in two directions rather than all four? e.g., east and south, or north and west. – Maxpm May 09 '11 at 02:47
  • My pleasure. I love this kind of thing. I've edited my answer to include what I think may be a more efficient "second step." – Perry Horwich May 09 '11 at 02:48
  • @Maxpm, I think growing squares (both directions at once) may not always find the largest rectangle. – Perry Horwich May 09 '11 at 02:55
  • 1
    The problem with your second algorithm is that you need a `<=` check, not an `==` check. Otherwise you will miss a lot of potential rectangles, including possibly the biggest one. Also, you'd still duplicate work going up/down. You might avoid this if you remove ones you combine (though with a `<=` check, this would break some cases). This problem can get remarkably complicated once you start to try to optimize it :) – Merlyn Morgan-Graham May 09 '11 at 02:57
  • 1
    This algorithm is suboptimal. Example: In the sample image given, take the square closest to the lower-right-hand corner and move it all the way to the left. Now the optimal rectangle does not extend as far E-W as the largest line in any member row. – Ben Voigt May 09 '11 at 03:00
  • @Merlyn, thanks for your comment. Hmmm.... The only > or = checking I am requiring in algo #2 is a comparison of area. – Perry Horwich May 09 '11 at 03:03
  • @Ben. Thx for your comment. I think determining the number of identical neighbors SOUTH for each point in LINES and saving the area for that seed point will check every possible rectangle in ARRAY. – Perry Horwich May 09 '11 at 03:05
  • 1
    @Perry: [Here is the image I mean](http://dl.dropbox.com/u/6919979/largest%20subrectangle.png). The optimal rectangle is 7x3. I believe your algorithm can only find a 4x5. – Ben Voigt May 09 '11 at 03:13
  • @Ben - Oh! Yes, you are correct. The search for neighbors SOUTH of a point in LINES would have to accept neighbors <= the length value. I suppose this would also require some clever checking to determine how far SOUTH to keep looking for each point, but I think the checking can be done in a streamlined way. I will edit my answer accordingly. – Perry Horwich May 09 '11 at 03:21
  • @Merlyn - I think I see your point now, given Ben's comment and picture. Many thanks. – Perry Horwich May 09 '11 at 03:26
2

A straight forward approach would be to do a loop through all the potential rectangles in the grid, figure out their area, and if it is greater than the current highest area, select it as the highest:

var biggestFound
for each potential rectangle:
    if area(this potential rectangle) > area(biggestFound)
        biggestFound = this potential rectangle

Then you simply need to find the potential rectangles.

for each square in grid:
    recursive loop 1:
        if not occupied:
            grow right until occupied, and return a rectangle
            grow down one and recurse (call loop 1)

This will duplicate a lot of work (for example you will re-evaluate a lot of sub-rectangles), but it should give you an answer.

Edit

An alternate approach might be to start with a single square the size of the grid, and "subtract" occupied squares to end up with a final set of potential rectangles. There might be optimization opportunities here using quadtrees, and in ensuring that you keep split rectangles "in order", top to bottom, left to right, in case you need to re-combine rectangles farther down in the algorithm.

If you are actually starting out with rectangular data (for your "populated grid" set), instead of a loose pixel grid, then you could easily get better perf out of a rectangle/region subtracting algorithm.

I'm not going to post pseudo-code for this because the idea is completely experimental, and I have no idea if the perf will be any better for a loose pixel grid ;)

Windows system "regions" and "dirty rectangles", as well as general "temporal caching" might be good inspiration here for more efficiency. There are also a lot of z-buffer tricks if this is for a graphics algorithm...

Merlyn Morgan-Graham
  • 58,163
  • 16
  • 128
  • 183
  • Would there be more efficient methods of computing this rather then looping through each possibilities, or is that the only way? – Johnathan May 09 '11 at 02:10
  • I am sure there are more efficient ways of calculating the potential rectangles. I was just trying to give you a solution that would work, since you didn't stipulate any requirements on performance or what you've already done. Plus, it is often useful to code up an obviously correct solution before you worry about bug-finding in more clever solutions. – Merlyn Morgan-Graham May 09 '11 at 02:14
  • Thanks for your help, I appreciate it. Sorry I was not clear. – Johnathan May 09 '11 at 02:20
  • @user422382: No problem. It's just there are a lot of students on right now posting homework problems with no work, so basically asking people to do their homework for them :) I figured a high level generic answer was enough of a hint, if this happened to be the case. – Merlyn Morgan-Graham May 09 '11 at 02:26
  • Fortunately, this is only for a hobby project I am working on. Thanks again for letting me know, I will be extra careful knowing that ;-) – Johnathan May 09 '11 at 02:28
  • Oh, and ultimately I'd just code up something that works, and leave it until I find it is sucking up too much of my CPU time/memory :) Lets you get onto other parts of the program faster, and make it more interesting. – Merlyn Morgan-Graham May 09 '11 at 03:00
1

Use dynamic programming approach. Consider a function S(x,y) such that S(x,y) holds the area of the largest rectangle where (x,y) are the lowest-right-most corner cell of the rectangle; x is the row co-ordinate and y is the column co-ordinate of the rectangle.

For example, in your figure, S(1,1) = 1, S(1,2)=2, S(2,1)=2, and S(2,2) = 4. But, S(3,1)=0, because this cell is filled. S(8,5)=40, which says that the largest rectangle for which the lowest-right cell is (8,5) has the area 40, which happens to be the optimum solution in this example.

You can easily write a dynamic programming equation of S(x,y) from the value of S(x-1,y), S(x,y-1) and S(x-1,y-1). Using that you can obtain the values of all S(x,y) in O(mn) time, where m and n are the row and column dimension of the given table. Once, S(x,y) are know for all 1<=x <= m, and for all 1 <= y <= n, we simply need to find the x, and y for which S(x,y) is the largest; this step also takes O(mn) time. By keeping addition data, you can also find the side-length of the largest rectangle.

The overall complexity is O(mn). To understand more on this, Read Chapter 15 or Cormen's algorithm book, specifically Section 15.4.

Hasan
  • 11
  • 2