Most efficient algorithm to find the biggest square in a two dimension map

Question

I would like to know the different algorithms to find the biggest square in a two dimensions map dotted with obstacles.

An example, where o would be obstacles:

...........................
....o......................
............o..............
...........................
....o......................
...............o...........
...........................
......o..............o.....
..o.......o................

The biggest square would be (if we choose the first one):

.....xxxxxxx...............
....oxxxxxxx...............
.....xxxxxxxo..............
.....xxxxxxx...............
....oxxxxxxx...............
.....xxxxxxx...o...........
.....xxxxxxx...............
......o..............o.....
..o.......o................

What would be the fastest algorithm to find it? The one with the smallest complexity?

EDIT: I know that people are interested on the algorithm explained in the accepted answer, so I made a document that explains it a bit more, you can find it here:

https://docs.google.com/document/d/19pHCD433tYsvAor0WObxa2qusAjKdx96kaf3z5I8XT8/edit?usp=sharing

What algorithms have you tried already? Are you running into performance problems? What research have you done? There may already be solutions to similar problems on stackoverflow that may help you solve this problem. — IceArdor, Dec 02 '13 at 18:47
@Dukeling I want the biggest one, but if there's several of the same biggest size, I want the first one of them. :) — Julien Fouilhé, Dec 02 '13 at 18:54
@IceArdor I have tried the basic one which is to save the position of the square if its size is bigger than the previous saved one, but, it's really slow and I wanted to know if there's a well-known algorithm which would help me. I didn't find any on the Internet so I'm asking the SO community :) — Julien Fouilhé, Dec 02 '13 at 18:57
@JulienFouilhé How did you loop through the rectangles? Did you just go through all of them, or was there any optimizations at all? If not, I can think of at least one thing you can try. — Dennis Meng, Dec 02 '13 at 19:01
@DennisMeng Well, if I have a square of size `x`, I stop looking for new rectangles `x` from the right and `x` from the bottom, it improves performances but I just wanted to know the best way to do this. I don't want to use threads though. — Julien Fouilhé, Dec 02 '13 at 19:06
Just curious. What I was thinking was pointing out that you can "bucket" rectangles based on their top-left corners. Each bucket would then really only have one rectangle worth considering. — Dennis Meng, Dec 02 '13 at 19:08
@DennisMeng For what I understand, it would not really improve the performance, would it ? — Julien Fouilhé, Dec 02 '13 at 19:15
Not really, hence why I left it in the comments. Whatever good answer that comes from this will probably use some similar trick though. — Dennis Meng, Dec 02 '13 at 19:19
Top-right corner has area 77 (11x7), whereas the rectangle you gave has area 49 — Dennis Meng, Dec 02 '13 at 19:41
@DennisMeng Yes, because I am looking for the biggest **square** , not rectangle. — Julien Fouilhé, Dec 02 '13 at 19:47
But even then there are two equally sized solutions, aren't they? Or is it required to use the full available space of 11x7 instead of only the 7x7 part needed for the square? — JensG, Dec 02 '13 at 19:57
@JensG Yes, but I would take the first one of them (from top-left to right down) :) — Julien Fouilhé, Dec 03 '13 at 00:05

Rob Neuhaus · Accepted Answer · 2013-12-02T20:59:10.157

Here is how to do this in the optimal amount of time, O(nm). This is built on top of @dukeling's insight that you never need to check a solution of size less than your current known best solution.

The key is to be able to build a data structure that can answer this query in O(1) time.

Is there an obstacle in the square whose top left corner is at r, c and has size k?

To solve that problem, we'll support answering a slightly harder question, also in O(1).

What is the count of items in the rectangle from r1, c1 to r2, c2?

It's easy to answer the square existence question with an answer from the rectangle count question.

To answer the rectangle count question, note that if you had pre-computed the answer for every rectangle that starts in the top left, then you could answer the general question for from r1, c1 to r2, c2 by a kind of clever/inclusion exclusion tactic using only rectangles that start in the top left

              c1   c2  
-----------------------
|             |    |  |
|   A         | B  |  |
|_____________|____|  |  r1
|             |    |  |
|    C        |  D |  |
|_____________|____|  |  r2
|_____________________|

We want the count of stuff inside D. In terms of our pre-computed counts from the top left.

Count(D) = Count(A ∪ B ∪ C ∪ D) - Count(A ∪ C) - Count(A ∪ B) + Count(A)

You can pre-compute all the top left rectangles in O(nm) by doing some clever row/column partial sums, but I'll leave that to you.

Then to answer the to the problem you want just involves checking possible solutions, starting with solutions that are at least as good as your known best. Your known best will only get better up to min(n, m) times total, so the best_possible increment will happen very rarely and almost all squares will be rejected in O(1) time.

best_possible = 0
for r in range(n):
 for c in range(m):
   while True:                      
     # this looks O(min(n, m)), but it's amortized O(1) since best_possible
     # rarely increased.      
     if possible(r, c, best_possible+1):
       best_possible += 1
     else:
       break

brilliant. a minor thing: one should start with best_possible = 0 (the map could be full of obstacles) and ask for `if possible(r, c, best_possible+1)`; then the final `best_possible` is indeed the best possible size — coproc, Dec 02 '13 at 21:01
@cproc fixed, thanks. Maybe I should just never write pseudo code, that way I can't get it wrong ;P. — Rob Neuhaus, Dec 02 '13 at 21:04
@rrenaud Thank you, I finally had time to try it. Not really harder to do than the naive one, even in C, but extremely faster ! for a map of 10000x10000, I reduced the time from ~55s to ~1.5s, and there's a lot more to optimize ! :) — Julien Fouilhé, Dec 08 '13 at 02:58

Bernhard Barker · Answer 2 · 2013-12-02T19:54:40.757

One idea, making use of binary search.

The basic idea:

Start off in the top-left corner. See if a 1x1 square would work.

If it will work, increase the sides lengths of the square by 1 and repeat.
If it won't work, move right and repeat. If you've reached the right-most position, move to the next line.

The native approach:

We can simply check every possible cell of every square at each step, but this is fairly inefficient.

The optimized approach:

When increasing the square size, we can just do a binary search over the next row and column to see if that row / column contains an obstacle at any of those positions.

When moving to the right, we can do a binary search for each next column to determine if that column contains an obstacle at any of those positions.

When moving down, we can do a similar binary on each of the columns in the target position.

Implementation note:

To start off, we'd need to go through all the rows and columns and set up arrays containing the positions of the obstacles for each of them, which we can use for the binary searches.

Running time:

We do 2 binary searches to increase the square size, and the square size is maximum the size of the grid, so that is fairly small (O(min(m,n) log max(m,n))) and gets dominated by the below.

Beyond that, for each position, we do a single binary search on a column.

So, for a grid with m columns and n rows, the overall complexity is O(mn log m).

But note how little we're actually searching below when the grid is sparse.

Example:

For your example:

 012345678901234567890123456
0...........................
1....o......................
2............o..............
3...........................
4....o......................
5...............o...........
6...........................
7......o..............o.....
8..o.......o................

We'd first try a 1x1 square in the top-left corner, which works.

Then a 2x2 square. For this, we do a binary search for the range [0,1] on the row 1, which can be represented simply by {4} - an array of a single position corresponding to where the obstacle is. And we also do a binary search for the range [0,1] on the column 1, which contains no obstacles, thus an empty array - {}.

Then a 3x3 square. For this, we do a binary search for [0,2] on the row 2, which contains 1 obstacles at position 12, thus {12}. And we also do a binary search for [0,2] on the column 2, which contains an obstacle at position 8, thus {8}.

Then a 4x4 square. For this, we do a binary search for [0,3] on the row 3 - {}. And for [0,3] on column 3 - {}.

Then a 5x5 square. For this, we do a binary search for [0,4] on the row 4 - {4}. And for [0,4] column 4 - {1,4}.

Here is the first one we actually find. In the range [0,4], we find 4 in both the row and the column (we only really need to find one of them). So this indicates a fail.

From here we do a binary search on column 4 (again - not really necessary) for [0,4]. Then binary search columns 5-8 for [0,4], none of them found, so a square starting at position 5,0 is the next possible candidate.

So from here we try to increase the square size to 5x5, which works, then 6x6 and 7x7, which works.

Then we try 8x8, which doesn't work.

And so on.

I know binary search, but how does yours work?

So we're basically doing a range search within a set of values. This is fairly easy to do. First search for the starting value of the range, then the end value. If we get to the same point, there are no values in the range.

We don't really care what values exist in the range, just whether or not there are any.

"we can just do a binary search over the next row and column to see if that row / column contains an obstacle at any of those positions" - can you explain how that's quicker than just checking each one? — andrew cooke, Dec 02 '13 at 19:30
I am confused. Don't you have to do the "try expand by 1" step up to n times for each corner? — Rob Neuhaus, Dec 02 '13 at 19:52
@andrewcooke I added a partial example, perhaps that helps to explain it better. — Bernhard Barker, Dec 02 '13 at 19:55
If you have your map only as 2D-array then you have to check each cell of the square and then only "try expand by 1" works. A binary search only makes sense if you have the obstacles of each row and column in a separate sorted array or map. Since this is non-trivial you have to elaborate that in more details. — coproc, Dec 02 '13 at 20:03
@coproc I briefly mentioned that under "Implementation note". I consider that to be fairly trivial to do by simply looping through the entire grid (possibly twice - once for rows and once for columns) to set up the arrays (by enqueue to the applicable array). — Bernhard Barker, Dec 02 '13 at 20:05
@andrewcooke Consider expanding from 3x3 to 4x4 in the example. Checking each cell would require checking `0,3`, `1,3`, `2,3`, `3,3`, `3,2`, `3,1`, `3,0`. Instead we just have an array of where the obstacles are in row 3 and column 3 and do a binary searches on them. For small squares, it would possibly be less efficient, but when we get larger squares, checking each cell could involve O(m+n) work, where-as the binary search is maximum O(log m + log n). (I hope you understand as I don't think I can explain it more than than). — Bernhard Barker, Dec 02 '13 at 20:11
@rrenaud You start at the left-top corner, moving right and down. Once you've found a larger square, you never check a smaller one again, since there's no point. You basically just look for a new position for this larger square - once you've found one, you try to increase its size again. — Bernhard Barker, Dec 02 '13 at 20:16
@Dukeling nice, i see it. Using your idea, I posted an O(n * m) solution. — Rob Neuhaus, Dec 02 '13 at 20:46

rohit89 · Answer 3 · 2013-12-02T19:50:40.627

2

So here's one rough approach.

Store the x-y positions of all the obstacles.
For each obstacle O
   find obstacle C that is nearest to it column-wise.
   find obstacle R-top that is nearest to it row-wise from the top.
   find obstacle R-bottom that is nearest to it row-wise from the bottom.
   if (|R-top.y - R-bottom.y| != |O.x - C.x|) continue
   Size of the square = Abs((R-top.y - R-bottom.y) * (O.x - C.x))
Keep track of the sizes and positions to find the largest square

Complexity is roughly O(k^2) where k is the number of obstacles. You could reduce it to O(k * log k) if you use binary search.

edited Dec 02 '13 at 19:50

answered Dec 02 '13 at 19:26

rohit89

5,745
2
25
42

1

The approach of whether you calculate using the obstacles or the free-space depends on whether the obstacle map has a higher density of obstacles or free space. "for each obstacle O, find obstacle C near obstacle O" is an O(n^2) operation if implemented naively. – IceArdor Dec 02 '13 at 19:43
Right. I'm assuming the number of obstacles is much lesser than free space. – rohit89 Dec 02 '13 at 19:47
What if square is not bounded by obstables? Your algorithm wont evaluate them – Vikram Bhat Dec 02 '13 at 20:16
@VikramBhat What if we consider map edges as obstacles ? :) – Julien Fouilhé Dec 03 '13 at 00:10
@JulienFouilhé The unbounded largest square not be bounded on all four sides even when you consider the map edges as obstacles. – Vikram Bhat Dec 03 '13 at 02:59

score 2 · Answer 4 · edited May 23 '17 at 12:16

The following SO articles are identical/similar to the problem you're trying to solve. You may want to look over those answers as well as the responses to your question.

Here's the baseline case I'd use, written in simplified Python/pseudocode.

# obstacleMap is a list of list of MapElements, stored in row-major order
max([find_largest_rect(obstacleMap, element) for row in obstacleMap for element in row])    

def find_largest_rect(obstacleMap, upper_left_elem):    
    size = 0    
    while not has_obstacles(obstacleMap, upper_left_elem, size+1):    
        size += 1    
    return size    

def has_obstacles(obstacleMap, upper_left_elem, size):    
    #determines if there are obstacles on the on outside square layer    
    #for example, if U is the upper left element and size=3, then has_obstacles checks the elements marked p.    
    # .....    
    # ..U.p    
    # ....p    
    # ..ppp    
    periphery_row = obstacleMap[upper_left_elem.row][upper_left_elem.col:upper_left_elem.col+size]    
    periphery_col = [row[upper_left_elem.col+size] for row in obstacleMap[upper_left_elem.row:upper_left_elem.row+size]    
    return any(is_obstacle(elem) for elem in periphery_row + periphery_col)

def is_obstacle(elem):    
    return elem.value == 'o'    

class MapElement(object):    
    def __init__(self, row, col, value):    
        self.row = row    
        self.col = col    
        self.value = value

Solve a problem, find it's already been solved better :(. I think we should report dup/close. — Rob Neuhaus, Dec 02 '13 at 21:09

Vikram Bhat · Answer 5 · 2013-12-02T20:30:24.730

here is an approach using recurrence relation :-

isSquare(R,C1,C2) = noObstacle(R,C1,R,C2) && noObstacle(R,C2,R-(C2-C1),C2) && isSquare(R-1,C1,C2-1)

isSquare(R,C1,C2) = square that has bottom side (R,C1) to (R,C2) 

noObstacle(R1,C1,R2,C2) = checks whether there is no obstacle in line segment (R1,C1) to (R2,C2)

Find Max (C2-C1+1) which where isSquare(R,C1,C2) = true

You can use dynamic programming to solve this problem in polynomial time. Use suitable data structure for searching obstacle.

Most efficient algorithm to find the biggest square in a two dimension map

5 Answers5

Linked