5

I have encountered an inoridinary problem that given a NxM 0-1 matrix and a number K(<=NxM) and I have to find a minimal subrectangle area of that 0-1 matrix with at least K 1's in inside that subrectangle. Furthermore it's area(the product of both dimensions) should be minimized.

For example:
00000
01010
00100
01010
00000
K = 3

So I can find a subrectangle with minimal area 6 that contains 3 1's inside.
10
01
10

NOTE that the target subrectangle that I mean should contains consecutive numbers of rows and columns from the original 0-1 matrix.

inker
  • 117
  • 8
  • What have you tried so far? Do you have anything against a bruteforce solution? – hugomg Jul 24 '12 at 12:33
  • Actually there must have an approximately O(NxM) runtime solution. Somebody did that so the bruteforce must not work. – inker Jul 24 '12 at 12:48
  • I can't phrase it completely, but off the top of my head, I would start with the biggest possible rectangle (NxM) and then reduce the size of the rectangle until not possible anymore. At each reduction step there are 4 possibilities (reduce a row from top or bottom and reduce a column from left or right). Maybe some efficient pruning or heuristic (like A*) would give a good effective running time. – Christian Jul 24 '12 at 13:01

3 Answers3

3
Compute cumulative sum of rows R[i,j] and columns C[i,j].
For top-left corner (i,j) of each possible sub-rectangle:
   Starting from a single-row sub-rectangle (n=i),
   Search the last possible column for this sub-rectangle (m).
   While m>=j:
     While there are more than 'k' "ones" in this sub-rectangle:
       If this is the smallest sub-rectangle so far, remember it.
       Remove column (--m).
       This decreases the number of "ones" by C[m+1,n]-C[m+1,j-1].
     Add next row (++n).
     This increases the number of "ones" by R[m,n]-R[i-1,n].

Time complexity is O(NM(N+M)).

Two nested loops may be optimized by changing linear search to binary search (to process skinny sub-rectangles faster).

Also it is possible (after adding a row/a column to the sub-rectangle) to decrease in O(1) time the number of columns/rows in such a way that the area of this sub-rectangle is not larger than the area of the best-so-far sub-rectangle.

Both these optimizations require calculation of any sub-rectangle weight in O(1). To make it possible, pre-calculate cumulative sum of all elements for sub-rectangles [1..i,1..j] (X[i,j]). Then the weight of any sub-rectangle [i..m,j..n] is computed as X[m,n]-X[i-1,n]-X[m,j-1]+X[i-1,j-1].


Compute cumulative sum of columns C[i,j].
For any starting row (k) of possible sub-rectangle:
  For any ending row (l) of possible sub-rectangle:
    Starting column (m = 1).
    Ending column (n = 1).
    While n is not out-of-bounds
      While there are less than 'k' "ones" in sub-rectangle [k..l,m..n]:
        Add column (++n).
        This increases the number of "ones" by C[l,n]-C[k-1,n].
      If this is the smallest sub-rectangle so far, remember it.
      Remove column (++m).
      This decreases the number of "ones" by C[l,m-1]-C[k-1,m-1].

Time complexity is O(N2M).

Loop by 'l' may be terminated when all sub-rectangles, processed inside it, are single-column sub-rectangles (too many rows) or when no sub-rectangles, processed inside it, contain enough "ones" (not enough rows).

Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
  • +1, nice. We never have to extend out to the right again (say, to column p) after extending down a row, because we know that we must have already found an enough-ones-containing rectangle with width p and fewer rows (and thus smaller area) if we ever contracted to a rectangle of width p-1. – j_random_hacker Jul 24 '12 at 16:00
0

The problem is NP-hard because the clique decision problem can be reduced to it. So there is no algorithm that is more efficient than the brute-force approach of trying all the possible submatrices (unless P=NP).

The clique decision problem can be reduced to your problem in the following way:

  • Let the matrix be the adjacency matrix of the graph.
  • Set K=L^2, where L is the size of the clique we are looking for.
  • Solve your problem on this input. The graph contains an L-clique iff the solution to your problem is an LxL submatrix containing only ones (which can be checked in polynomial time).
Martin B
  • 23,670
  • 6
  • 53
  • 72
  • Thanks for your reply, but I certainly know it must has a polynomial solution. And actually I'm trying to solve a harder vision of this problem. The problem that I posted is just a special case of the original problem. – inker Jul 24 '12 at 12:41
  • The submatrix that I mention here must be occupied consecutive numbers of rows and columns of the original matrix. Sorry I'm not good at math,I might mistook some math defination. – inker Jul 24 '12 at 12:45
  • 2
    I think he only cares about contiguous submatices. The clique problem reduction assumes you can skip rows and columns (for example, getting a submatrix using only the first, third and fifth rows and columns). Also, as inker mentioned, there is a trivial brute force solution in O(N^4) time (or something like that) – hugomg Jul 24 '12 at 12:46
  • 2
    @inker: That changes the problem quite a bit. The standard definition of a submatrix (http://en.wikipedia.org/wiki/Submatrix) does not require that the rows and columns are contiguous. If you have this constraint, I suspect that the problem has a relatively efficient solution that uses dynamic programming. – Martin B Jul 24 '12 at 12:50
0

Off the top of my head, you can make a list of the coordinate pairs(?) of all ones in the matrix, find the (smallest) containing sub-rectangles for each K-combination among them*, then pick the smallest of those.

* which is defined by the smallest and largest row and column indices in the K-combination.

alcedine
  • 909
  • 5
  • 18