42

I need to find the largest square of 1's in a giant file full of 1's and 0's. I know i have to use dynamic programming. I am storing it in a 2D array. Any help with the algorithm to find the largest square would be great, thanks!

example input:

1 0 1 0 1 0
1 0 1 1 1 1
0 1 1 1 1 1
0 0 1 1 1 1
1 1 1 1 1 1

answer:

1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1

My code so far:

int Square (Sq[int x][int y]) {
   if (Sq[x][y]) == 0) {
       return 0;
   }
   else {
       return 1+MIN( Sq(X-1,Y), Sq(X,Y-1), Sq(X-1,Y-1) );
   }
}

(assuming values already entered into the array)

int main() {
    int Sq[5][6]; //5,6 = bottom right conner
    int X = Square(Sq[5][6]);
}

How do I go on from there?

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
batt
  • 421
  • 1
  • 5
  • 4
  • 2
    Please provide a summary of your findings so far. – LeopardSkinPillBoxHat Nov 13 '09 at 01:48
  • Are the inputs always square as well? – i_am_jorf Nov 13 '09 at 02:00
  • 1
    @jeffamaphone: the example input isn't. Draw what conclusions you like from that ;-) – Steve Jessop Nov 13 '09 at 02:03
  • Well i know it has something to do with the longest common substring and 1-0 napsack problem, but thats all i got. No the inputs could be a square or a rectangle. – batt Nov 13 '09 at 02:07
  • Thanks so much for this question – I will use this as an exercise in my tutorial next semester. – Konrad Rudolph Sep 19 '10 at 12:08
  • related: http://stackoverflow.com/questions/2478447/find-largest-rectangle-containing-only-zeros-in-an-nn-binary-matrix – jfs Jan 12 '11 at 17:00
  • 2
    Anyone looking to work this problem as an exercise should check out Problem E: Maximum Square from the South East Regional ACM ICPC 2010 Problem Set: http://ser.cs.fit.edu/ser2010/problems/ Implement the algorithm, and test it against the Judges input/output files. – Dream Lane Feb 16 '12 at 02:23

7 Answers7

83

Here is a sketch of the solution:

For each of the cells we will keep a counter of how big a square can be made using that cell as top left. Clearly all cells with 0 will have 0 as the count.

Start iterating from bottom right cell and go to bottom left, then go to one row up and repeat.

At each scan do this:

  1. If the cell has 0 assign count=0
  2. If the cell has 1 and is an edge cell (bottom or right edge only), assign count=1
  3. For all other cells, check the count of the cell on its right, right-below, and below. Take the min of them and add 1 and assign that to the count. Keep a global max_count variable to keep track of the max count so far.

At the end of traversing the matrix, max_count will have the desired value.

Complexity is no more that the cost of traversal of the matrix.

This is how the matrix will look like after the traversal. Values in parentheses are the counts, i.e. biggest square that can be made using the cell as top left.

1(1) 0(0) 1(1) 0(0) 1(1) 0(0)
1(1) 0(0) 1(4) 1(3) 1(2) 1(1)
0(0) 1(1) 1(3) 1(3) 1(2) 1(1)
0(0) 0(0) 1(2) 1(2) 1(2) 1(1)
1(1) 1(1) 1(1) 1(1) 1(1) 1(1)

Implementation in Python

def max_size(mat, ZERO=0):
    """Find the largest square of ZERO's in the matrix `mat`."""
    nrows, ncols = len(mat), (len(mat[0]) if mat else 0)
    if not (nrows and ncols): return 0 # empty matrix or rows
    counts = [[0]*ncols for _ in xrange(nrows)]
    for i in reversed(xrange(nrows)):     # for each row
        assert len(mat[i]) == ncols # matrix must be rectangular
        for j in reversed(xrange(ncols)): # for each element in the row
            if mat[i][j] != ZERO:
                counts[i][j] = (1 + min(
                    counts[i][j+1],  # east
                    counts[i+1][j],  # south
                    counts[i+1][j+1] # south-east
                    )) if i < (nrows - 1) and j < (ncols - 1) else 1 # edges
    return max(c for rows in counts for c in rows)
dawg
  • 98,345
  • 23
  • 131
  • 206
Joy Dutta
  • 3,416
  • 1
  • 19
  • 19
  • 8
    + Even though its a competing answer, yours is clearly optimal in terms of complexity, pretty ingeneous! – DeusAduro Nov 13 '09 at 02:24
  • 1
    Perhaps one thing, point 2 says that if it is an edge cell simply assign 1, this is only correct for bottom/right edge cells as left/top edge cells may be the top left of a larger square? – DeusAduro Nov 13 '09 at 02:26
  • My bad, we have to do a little check for edge cells for the left and top edges, let me edit my solution. Thanks a lot ! – Joy Dutta Nov 13 '09 at 02:32
  • Very simple and completely spot on. I wish I could upvote more than once. – Matthieu M. Nov 13 '09 at 10:31
  • Any way to extend the method to rectangular blocks? – yassin Sep 18 '10 at 22:20
  • 3
    Why do you start from the bottom right instead of (as would be usual) from the top left? The result is the same, just the recurrence will look more natural (because it will use incrementing indices and the base case is at 0 rather than at n). – Apart from that, perfect answer. – Konrad Rudolph Sep 19 '10 at 10:31
  • We have to traverse array in Spiral shape, to set proper count . right ? – siva Dec 09 '10 at 09:37
  • I've added implementation in Python of the described algorithm. – jfs Jan 12 '11 at 16:44
  • Here's solution for rectangular blocks (related problem) http://stackoverflow.com/questions/2478447/find-largest-rectangle-containing-only-zeros-in-an-nn-binary-matrix/4671342#4671342 – jfs Jan 12 '11 at 16:59
  • Your code has an error. if mat[i][j] == ZERO should be if mat[i][j] != ZERO – Sajad Rastegar Dec 30 '12 at 19:03
  • @j-f-sebastian Where is your Python implementation? – tommy.carstensen May 23 '15 at 13:37
  • I'm not familiar with Python, does anyone have a Java implementation of this algorithm? – Talen Kylon Oct 29 '15 at 01:23
  • what is the name of this algorithm ? – ilumin Mar 09 '16 at 00:30
8

LSBRA(X,Y) means "Largest Square with Bottom-Right At X,Y"

Pseudocode:

LSBRA(X,Y):
   if (x,y) == 0:
       0
   else:
       1+MIN( LSBRA(X-1,Y), LSBRA(X,Y-1), LSBRA(X-1,Y-1) )

(For edge cells, you can skip the MIN part and just return 1 if (x,y) is not 0.)

Work diagonally through the grid in "waves", like the following:

    0 1 2 3 4
  +----------
0 | 1 2 3 4 5
1 | 2 3 4 5 6
2 | 3 4 5 6 7
3 | 4 5 6 7 8

or alternatively, work through left-to-right, top-to-bottom, as long as you fill in edge cells.

    0 1 2 3 4
  +----------
0 | 1 2 3 4 5
1 | 6 7 8 9 .
2 | . . . . .
3 | . . . . .

That way you'll never run into a computation where you haven't previously computed the necessary data - so all of the LSBRA() "calls" are actually just table lookups of your previous computation results (hence the dynamic programming aspect).

Why it works

In order to have a square with a bottom-right at X,Y - it must contain the overlapping squares of one less dimension that touch each of the other 3 corners. In other words, to have

XXXX
XXXX
XXXX
XXXX

you must also have...

XXX.    .XXX    ....    ....
XXX.    .XXX    XXX.    ....
XXX.    .XXX    XXX.    ....
....    ....    XXX.    ...X

As long as you have those 3 (each of the LSBRA checks) N-size squares plus the current square is also "occupied", you will have an (N+1)-size square.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • sorry could you please explain more the pseudo code? LSBRA is a function that returns an integer (the largest value?) and min returns the smallest value of the 3 LSBRA passed in? – batt Nov 13 '09 at 02:32
  • LSBRA is just a placeholder for "compute this value". For a dynamic programming implementation, it basically means "what is stored in our array of results at X,Y". For a recursive implementation, it would be a function. And yes, MIN() means take the smallest of the arguments. – Amber Nov 13 '09 at 02:44
  • i edited my original post with your solution, but it seems wrong. could you take a look at it please? =] – batt Nov 13 '09 at 02:59
3

The first algorithm that comes to my mind is:

  1. '&&' column/row 1 with column/row 2 if, this is to say do an '&&' operation between each entry and its corresponding entry in the other column/row.
  2. Check the resulting column, if there are any length 2 1's that means we hit a 2x2 square.
  3. And the next column with the result of the first two. If there are any length 3 1's we have hit a 3x3 square.
  4. Repeat until all columns have been used.
  5. Repeat 1-4 starting at column 2.

I won't show you the implementation as its quite straightforward and your problem sounds like homework. Additionally there are likely much more efficient ways to do this, as this will become slow if the input was very large.

DeusAduro
  • 5,971
  • 5
  • 29
  • 36
2

Let input matrix is M: n x m

T[i][j] is DP matrix which contains largest square side with squares bottom right angle (i,j).

General rule to fill the table:

if (M[i][j] == 1) {
  int v = min(T[i][j-1], T[i-1][j]);
  v = min(v, T[i-1][j-1]);
  T[i][j] = v + 1;
}
else 
  T[i][j] = 0;

The result square size is max value in T.

Filling T[i][0] and T[0][j] is trivial.

I am not sure if this algo can be used for your huge file, but you don't need to store entire matrix T but only current and previous lines only.

Following notes can help to undestand general idea:

  • all squares with right bottom angles (i-1, j), (i, j-1), (i-1, j-1) with size s are inside square of with right bottom angle (i, j) with size s+1.
  • if there is square of size s+1 with right bottom corner at (i, j), then size of maximal square with right bottom angles (i-1, j), (i, j-1), (i-1, j-1) is at least s.
  • Opposite is also true. If size of at least one square with bottom right angles at (i-1, j), (i, j-1), (i-1, j-1) is less then s, then size of square with right bottom corner at (i, j) can not be larger then s+1.
sergtk
  • 10,714
  • 15
  • 75
  • 130
1

OK, the most inefficient way but simple would be:

  1. select first item. check if 1, if so you have a 1x1 square.

  2. check one below and one to right, if 1, then check row 2 col 2, if 1, 2x2 square.

  3. check row 3 col 1, col 2 and col 3, plus row 1 col 3, row 2 col 3, if 1, 3x3.

  4. So basically you keep expanding the row and col together and check all the cells inside their boundaries. As soon as you hit a 0, it's broken, so you move along 1 point in a row, and start again.

  5. At end of row, move to next row.

  6. until the end.

You can probably see how those fit into while loops etc, and how &&s can be used to check for the 0s, and as you look at it, you'll perhaps also notice how it can be sped up. But as the other answer just mentioned, it does sound a little like homework so we'll leave the actual code up to you.

Good luck!

Mark Mayo
  • 12,230
  • 12
  • 54
  • 85
1

The key here is that you can keep track of the root of the area instead of the actual area, using dynamic programming.

The algorithm is as follow:

Store an 2D array of ints called max-square, where an element at index i,j represents the size of the square it's in with i,j being the bottom right corner. (if max[i,j] = 2, it means that index i,j is the bottom right corner of a square of size 2^2 = 4)

For each index i,j:

if at i,j the element is 0, then set max-square i,j to 0.

else:

Find the minimum of max-square[i - 1, j] and max-square[i, j - 1] and max-square[i - 1][j -1]. set max-square[i, j] to 1 + the minimum of the 3. Inductively, you'll end up filling in the max-square array. Find/or keep track of the maximum value in the process, return that value^2.

Take a look at these solutions people have proposed: https://leetcode.com/discuss/questions/oj/maximal-square?sort=votes

Han Sheng Huang
  • 119
  • 1
  • 6
0

Let N be the amount of cells in the 2D array. There exists a very efficient algorithm to list all the maximum empty rectangles. The largest empty square is inside one of these empty rectangles, and founding it is trivial once the list of the maximum empty rectangles has been computed. A paper presenting a O(N) algorithm to create such a list can be found at www.ulg.ac.be/telecom/rectangles as well as source code (not optimized). Note that a proof exists (see the paper) that the number of largest empty rectangles is bounded by N. Therefore, selecting the largest empty square can be done in O(N), and the overall method is also O(N). In practice, this method is very fast. The implementation is very easy to do, since the whole code should not be more than 40 lines of C (the algorithm to list all the maximum empty rectangles takes about 30 lines of C).