Largest rectangular sub matrix with the same number

Question

I am trying to come up with a dynamic programming algorithm that finds the largest sub matrix within a matrix that consists of the same number:

example:

{5 5 8}
{5 5 7}
{3 4 1}

Answer : 4 elements due to the matrix

   5 5 
   5 5

So 'largest' means 'most number of elements'? And you only need one solution? Does it matter which one? — Jeffrey Sax, Oct 14 '11 at 18:37
yes, largest means most number of elements. and i need only one solution. — TimeToCodeTheRoad, Oct 14 '11 at 18:38
what do you mean by re ordering allowed RBarry Young? We cannot change the original structure of the matrix — TimeToCodeTheRoad, Oct 14 '11 at 18:39
@PengOne: I tried dynamic programming, but it did not work. Maybe I defined the subproblem wrong. I defined an called array S[i][j] which stores the length of the maximum submatrix that has all elements equal to M[i][j], however, after hours of thought, I realized it violated the optimal substructure property — TimeToCodeTheRoad, Oct 14 '11 at 19:02
@TimeToCodeTheRoad: Then do not use the term "sub-matrix" because that implies that the order/contiguousnes of the rows & columns does not matter. (see here http://en.wikipedia.org/wiki/Submatrix) — RBarryYoung, Oct 14 '11 at 20:27
Hi! I don't think dynamic programming is a proper tool here (unless it's a homework). I posted a solution that works in `O(n)`, which is obviously the best possible. I think the dynamic programming would be clumsy here (just because of the decomposition as you mentioned). — Tomas, Oct 14 '11 at 21:43

score 25 · Accepted Answer · edited May 23 '17 at 12:07

This is a question I already answered here (and here, modified version). In both cases the algorithm was applied to binary case (zeros and ones), but the modification for arbitrary numbers is quite easy (but sorry, I keep the images for the binary version of the problem). You can do this very efficiently by two pass linear O(n) time algorithm - n being number of elements. However, this is not a dynamic programming - I think using dynamic programming here would be clumsy and inefficient in the end, because of the difficulties with problem decomposition, as the OP mentioned - unless its a homework - but in that case you can try to impress by this algorithm :-) as there's obviously no faster solution than O(n).

Algorithm (pictures depict binary case):

Say you want to find largest rectangle of free (white) elements.

enter image description here

Here follows the two pass linear O(n) time algorithm (n being number of elemets):

1) in a first pass, go by columns, from bottom to top, and for each element, denote the number of consecutive elements available up to this one:

enter image description here

repeat, until:

enter image description here

Pictures depict the binary case. In case of arbitrary numbers you hold 2 matrices - first with the original numbers and second with the auxiliary numbers that are filled in the image above. You have to check the original matrix and if you find a number different from the previous one, you just start the numbering (in the auxiliary matrix) again from 1.

2) in a second pass you go by rows, holding data structure of potential rectangles, i.e. the rectangles containing current position somewhere at the top edge. See the following picture (current position is red, 3 potential rectangles - purple - height 1, green - height 2 and yellow - height 3):

enter image description here

For each rectangle we keep its height k and its left edge. In other words we keep track of the sums of consecutive numbers that were >= k (i.e. potential rectangles of height k). This data structure can be represented by an array with double linked list linking occupied items, and the array size would be limited by the matrix height.

Pseudocode of 2nd pass (non-binary version with arbitrary numbers):

var m[] // original matrix
var aux[] // auxiliary matrix filled in the 1st pass
var rect[] // array of potential rectangles, indexed by their height
           // the occupied items are also linked in double linked list, 
           // ordered by height

foreach row = 1..N // go by rows
    foreach col = 1..M
        if (col > 1 AND m[row, col] != m[row, col - 1]) // new number
            close_potential_rectangles_higher_than(0);  // close all rectangles

        height = aux[row, col] // maximal height possible at current position

        if (!rect[height]) { // rectangle with height does not exist
            create rect[height]    // open new rectangle
            if (rect[height].next) // rectangle with nearest higher height
                                   // if it exists, start from its left edge
                rect[height].left_col = rect[height].next.left_col
            else
                rect[height].left_col = col; 
        }

        close_potential_rectangles_higher_than(height)
    end for // end row
    close_potential_rectangles_higher_than(0);
        // end of row -> close all rect., supposing col is M+1 now!

end for // end matrix

The function for closing rectangles:

function close_potential_rectangles_higher_than(height)
    close_r = rectangle with highest height (last item in dll)
    while (close_r.height > height) { // higher? close it
        area = close_r.height * (col - close_r.left_col)
        if (area > max_area) { // we have maximal rectangle!
            max_area = area
            max_topleft = [row, close_r.left_col]
            max_bottomright = [row + height - 1, col - 1]
        }
        close_r = close_r.prev
        // remove the rectangle close_r from the double linked list
    }
end function

This way you can also get all maximum rectangles. So in the end you get:

enter image description here

And what the complexity will be? You see that the function close_potential_rectangles_higher_than is O(1) per closed rectangle. Because for each field we create 1 potential rectangle at the maximum, the total number of potential rectangles ever present in particular row is never higher than the length of the row. Therefore, complexity of this function is O(1) amortized!

So the whole complexity is O(n) where n is number of matrix elements.

I think the algorithm is O(n^3) where n is the dimenion of the matrix, and not O(n^2) as said. The reason being you are doing the above for all k, that is for all heights, and for each height you visit all the sqaures? Is this a right interpretation. Thus, the code willl look like something as follows: for(height =1 to height<=n){visit all sqarues} — TimeToCodeTheRoad, Oct 15 '11 at 06:35
@Time, nope. You probably speak about the 2.2 step of the algorithm, but this is amortized `O(1)`. I specified the algorithm in more detail, see update. Thanks for interesting question! — Tomas, Oct 15 '11 at 07:55
T: thanks for the update! i am still trying to understand the time complexity part. I wanted to request you if you could please provide some code, as it not provided even in the previous posts. Thus, it will be really helpful. I am only asking for the code for the second pass. As you probably must have realized, your solution is the most efficient one, thus I am really interested in the code for the second part. — TimeToCodeTheRoad, Oct 15 '11 at 09:00
I'm sorry, I need a little help understanding part 2. What do you mean, in 2.1, with nearest higher rectangle? The first one in the structure with height > k? By potential rectangles you mean open rectangles? If you could provide some rough pseudocode I'd be grateful.. Does this algorithm have a name I might search it by? I'm trying to solve a similar problem. — salvador p, Mar 18 '13 at 16:40
Dear @TimeToCodeTheRoad and user451963, please see my updated post - I have added pseudocode, complexity explanation and a scheme explaing what I mean with potential rectangles! — Tomas, Jun 27 '13 at 14:46
@user451963, thanks for your comment, yes, by potential rectangles I mean open rectangles! There's no name to the algorithm, I just remember it as a beautiful exercise we had in programming at 1st year at the university :-) Our teacher was enlightened :-) — Tomas, Jun 27 '13 at 14:47
@TMS does [this](http://stackoverflow.com/questions/25965492/largest-rectangular-sub-matrix-with-the-number-difference-k) question is also similar two this? If so can you tell me how to draw second matrix? — Khamidulla, Sep 22 '14 at 04:35

score 3 · Answer 2 · answered Oct 14 '11 at 22:04

3

A dynamic solution:

Define a new matrix A wich will store in A[i,j] two values: the width and the height of the largest submatrix with the left upper corner at i,j, fill this matrix starting from the bottom right corner, by rows bottom to top. You'll find four cases:

case 1: none of the right or bottom neighbour elements in the original matrix are equal to the current one, i.e: M[i,j] != M[i+1,j] and M[i,j] != M[i,j+1] being M the original matrix, in this case, the value of A[i,j] is 1x1

case 2: the neighbour element to the right is equal to the current one but the bottom one is different, the value of A[i,j].width is A[i+1,j].width+1 and A[i,j].height=1

case 3: the neighbour element to the bottom is equal but the right one is different, A[i,j].width=1, A[i,j].height=A[i,j+1].height+1

case 4: both neighbours are equal: A[i,j].width = min(A[i+1,j].width+1,A[i,j+1].width) and A[i,j].height = min(A[i,j+1]+1,A[i+1,j])

the size of the largest matrix that has the upper left corner at i,j is A[i,j].width*A[i,j].height so you can update the max value found while calculating the A[i,j]

the bottom row and the rightmost column elements are treated as if their neighbours to the bottom and to the right respectively are different

in your example, the resulting matrix A would be:

{2:2 1:2 1:1}
{2:1 1:1 1:1}
{1:1 1:1 1:1}

being w:h width:height

answered Oct 14 '11 at 22:04

Orlando Osorio

3,116
7
29
52

Orlando, what is the time complexity of this algorithm? – Tomas Oct 14 '11 at 22:12
a few `O(1)` operations for each cell in the matrix so `O(n)` being `n` the size of the matrix – Orlando Osorio Oct 14 '11 at 22:15
is "few" a constant for sure? – Tomas Oct 14 '11 at 22:18
@OrlandoWilliam, can you please explain the logic behind the case 4 update. I mean why are you taking the minimum? – TimeToCodeTheRoad Oct 15 '11 at 05:23
1

@OrlandoWilliam: I think the algorithm wont work for the followign: {A 0 0 }, {A A A }, {A A A}, {A B B }, as For the topmost A, the algorithm will see that the bottom number is same, but the right one is not, so height will be 3 and widht=1, but the the optimal woudl be height =4 and width=1? What do you say? – TimeToCodeTheRoad Oct 15 '11 at 05:58
1

@OrlandoWilliam: hey, so u agree that the algorithm is a bit off? The reason being that if people see this post later on, they will be able to make it out? – TimeToCodeTheRoad Oct 16 '11 at 05:34
@TimeToCodeTheRoad: yes, you are right, i didn't notice that case – Orlando Osorio Oct 17 '11 at 15:33

score 0 · Answer 3 · answered Jul 15 '14 at 05:01

Modification to the above answer:

Define a new matrix A wich will store in A[i,j] two values: the width and the height of the largest submatrix with the left upper corner at i,j, fill this matrix starting from the bottom right corner, by rows bottom to top. You'll find four cases:

case 1: none of the right or bottom neighbour elements in the original matrix are equal to the current one, i.e: M[i,j] != M[i+1,j] and M[i,j] != M[i,j+1] being M the original matrix, in this case, the value of A[i,j] is 1x1

case 2: the neighbour element to the right is equal to the current one but the bottom one is different, the value of A[i,j].width is A[i+1,j].width+1 and A[i,j].height=1

case 3: the neighbour element to the bottom is equal but the right one is different, A[i,j].width=1, A[i,j].height=A[i,j+1].height+1

case 4: both neighbours are equal: Three rectangles are considered: 1. A[i,j].width=A[i,j+1].width+1; A[i,j].height=1;

A[i,j].height=A[i+1,j].height+1; a[i,j].width=1;
A[i,j].width = min(A[i+1,j].width+1,A[i,j+1].width) and A[i,j].height = min(A[i,j+1]+1,A[i+1,j])

The one with the max area in the above three cases will be considered to represent the rectangle at this position.

The size of the largest matrix that has the upper left corner at i,j is A[i,j].width*A[i,j].height so you can update the max value found while calculating the A[i,j]

the bottom row and the rightmost column elements are treated as if their neighbours to the bottom and to the right respectively are different.

score 0 · Answer 4 · edited May 23 '17 at 12:07

This question is a duplicate. I have tried to flag it as a duplicate. Here is a Python solution, which also returns the position and shape of the largest rectangular submatrix:

#!/usr/bin/env python3

import numpy

s = '''5 5 8
5 5 7
3 4 1'''

nrows = 3
ncols = 3
skip_not = 5
area_max = (0, [])

a = numpy.fromstring(s, dtype=int, sep=' ').reshape(nrows, ncols)
w = numpy.zeros(dtype=int, shape=a.shape)
h = numpy.zeros(dtype=int, shape=a.shape)
for r in range(nrows):
    for c in range(ncols):
        if not a[r][c] == skip_not:
            continue
        if r == 0:
            h[r][c] = 1
        else:
            h[r][c] = h[r-1][c]+1
        if c == 0:
            w[r][c] = 1
        else:
            w[r][c] = w[r][c-1]+1
        minw = w[r][c]
        for dh in range(h[r][c]):
            minw = min(minw, w[r-dh][c])
            area = (dh+1)*minw
            if area > area_max[0]:
                area_max = (area, [(r, c, dh+1, minw)])

print('area', area_max[0])
for t in area_max[1]:
    print('coord and shape', t)

Output:

area 4
coord and shape (1, 1, 2, 2)

Largest rectangular sub matrix with the same number

4 Answers4

Linked