Detecting rectangles (sub-arrays of same element value) in 2-d list

Question

A rectangle is defined as any rectangular-shaped section of zeros within a 2-d array of 1s and 0s. Typical example:

[
  [1, 1, 1, 1, 1, 1, 1, 1, 1],
  [1, 1, 1, 1, 1, 1, 1, 1, 0],
  [1, 1, 1, 0, 0, 0, 1, 0, 0],
  [1, 0, 1, 0, 0, 0, 1, 0, 0],
  [1, 0, 1, 1, 1, 1, 1, 1, 1],
  [1, 0, 1, 0, 0, 1, 1, 1, 1],
  [1, 1, 1, 0, 0, 1, 1, 1, 1],
  [1, 1, 1, 1, 1, 1, 1, 1, 1],
]

In this example, there are three such arrays:

My goal is to determine the coordinates (outer 3 extremeties) of each array.

I start by converting the 2-d list into a numpy array:

image_as_np_array = np.array(two_d_list)

I can then get the coordinates of all the zeros thus:

np.argwhere(image_as_np_array == 0)

But this merely provides a shortcut to getting the indices by iterating over each row and calling .index(), then combining with the index of that row within the 2-d list.

I envisage now doing something like removing any element of np.argwhere() (or np.where()) where there is only a single occurrence of a 0 (effectively disregarding any row that cannot form part of a rectangle), and then trying to align contiguous coordinates, but I'm stuck with figuring out how to handle cases where any row may contain part of more than just one single rectangle (as is the case in the 3rd and 4th rows above). Is there a numpy function or functions I can leverage?

For a specific rectangle shape with zeros only search pattern, you can use something like [`How can I check if one two-dimensional NumPy array contains a specific pattern of values inside it?`](http://stackoverflow.com/questions/32531377/how-can-i-check-if-one-two-dimensional-numpy-array-contains-a-specific-pattern-o). To generalize it, you might have to iterate through search patterns of varying shapes. — Divakar, Apr 16 '16 at 19:44
Say I stick a 2x2 square to the side of a 3x3 square getting a shape that looks like a 3x5 rectangle with a two corner 0s cut out, do I count that as the two squares mentioned earlier or do I count the bigger 2x5 rectangle it contains and 'waste' the other 3 0s? — Alex Hall, Apr 16 '16 at 21:38
`1 x n` or `n x 1` is a rectangle or no? From the example I assume no.. — Tony Babarino, Apr 16 '16 at 21:40
@AlexHall re: your second comment: no, the resultant shape, not having a single width and single height, would be disregarded, even though it contains several rectangles. — Pyderman, Apr 16 '16 at 22:23
So a rectangle of 0s must be completely surrounded by 1s? In that case the rightmost rectangle in your example is not valid. — Alex Hall, Apr 16 '16 at 22:25
I wonder how we can take advantage of the fact that what uniquely distinguishes a corner point is that is has exactly 3 neighbouring zeros, one in each of (vertical, horizontal, any diagonal) directions. — Pyderman, Apr 16 '16 at 22:25
@AlexHall no it does not need to be completely surround by 1s, it just needs to have a single width and single height (i.e. be rectangular), so the right-most *would* count. — Pyderman, Apr 16 '16 at 22:27
But there's an extra 0 on the top-right. The rectangle is contained within a contiguous non-rectangular shape. The top-right corner of the rectangle has four 0 neighbours. — Alex Hall, Apr 16 '16 at 22:30
@AlexHall yes my apologies you're right. So the only thing I think can help us out is what I've said above about what distinguishes a corner point. (getting *all* the corner points will suffice, combining them to define each rectangle would be a bonus). — Pyderman, Apr 16 '16 at 22:32
Well no, it's also necessary for the side 0s to have exactly 5 neighbouring 0s. It's almost what I said about being completely surrounded by 1s, except being against the edge of the array is also OK. So let's say that there must be no 0s outside the rectangle that are horizontally or vertically adjacent to the rectangle. — Alex Hall, Apr 16 '16 at 22:38
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/109353/discussion-between-alex-hall-and-pyderman). — Alex Hall, Apr 16 '16 at 22:40

Tony Babarino · Answer 1 · 2016-04-17T00:31:43.740

I have written a simple algorithms using the Sweep line method. The idea is that You go through the columns of You array column by column, and detect the series of zeros as potentially new rectangles. In each column You have to check if the rectangles detected earlier have ended, and if yes add them to the results.

import numpy as np
from sets import Set
from collections import namedtuple

example = np.array([
    [1, 1, 1, 1, 1, 1, 1, 1, 1],
    [1, 1, 1, 1, 1, 1, 1, 1, 0],
    [1, 1, 1, 0, 0, 0, 1, 0, 0],
    [1, 0, 1, 0, 0, 0, 1, 0, 0],
    [1, 0, 1, 1, 1, 1, 1, 1, 1],
    [1, 0, 1, 0, 0, 1, 1, 1, 1],
    [1, 1, 1, 0, 0, 1, 1, 1, 1],
    [1, 1, 1, 1, 1, 1, 1, 1, 1],
])

Rectangle = namedtuple("Rectangle", "left top bottom right")

def sweep(A):
    height = A.shape[0]
    length = A.shape[1]
    rectangles = dict()  # detected rectangles {(rowstart, rowend): col}
    result = []

    # sweep the matrix column by column
    for i in xrange(length):
        column = A[:, i]

        # for currently detected rectangles check if we should extend them or end
        for r in rectangles.keys():
            # detect non rectangles shapes like requesten in question edit and del those rectangles
            if all([x == 0 for x in column[r[0]:r[1]+1]]) and ((r[0]-1>0 and column[r[0]-1]==0) or (r[1]+1<height and column[r[1]+1]==0)):
                del rectangles[r]
            elif any([x == 0 for x in column[r[0]:r[1]+1]]) and not all([x == 0 for x in column[r[0]:r[1]+1]]):
                del rectangles[r]
            # special case in the last column - add detected rectangles
            elif i == length - 1 and all([x == 0 for x in column[r[0]:r[1]+1]]):
               result.append(Rectangle(rectangles[r], r[0], r[1], i))
            # if detected rectangle is not extended - add to result and del from list
            elif all([x == 1 for x in column[r[0]:r[1]+1]]):
                result.append(Rectangle(rectangles[r], r[0], r[1], i-1))
                del rectangles[r]

        newRectangle = False
        start = 0
        # go through the column and check if any new rectangles appear
        for j in xrange(height):
            # new rectangle in column detected
            if column[j] == 0 and not newRectangle and j+1 < height and column[j+1] == 0:
                start = j
                newRectangle = True
            # new rectangle in column ends
            elif column[j] == 1 and newRectangle:
                # check if new detected rectangle is already on the list
                if not (start, j-1) in rectangles:
                    rectangles[(start, j-1)] = i
                newRectangle = False

    # delete single column rectangles
    resultWithout1ColumnRectangles = []
    for r in result:
        if r[0] != r[3]:
            resultWithout1ColumnRectangles.append(r)
    return resultWithout1ColumnRectangles

print example
print sweep(example)

returns:

[[1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 0]
 [1 1 1 0 0 0 1 0 0]
 [1 0 1 0 0 0 1 0 0]
 [1 0 1 1 1 1 1 1 1]
 [1 0 1 0 0 1 1 1 1]
 [1 1 1 0 0 1 1 1 1]
 [1 1 1 1 1 1 1 1 1]]
[Rectangle(left=3, top=5, bottom=6, right=4), 
 Rectangle(left=3, top=2, bottom=3, right=5)]

What do the six numbers in each returned list represent? I suggest using `namedtuple` as I have. — Alex Hall, Apr 16 '16 at 23:04
Also you should read the comments, we have concluded that one of the three rectangles should not be counted. — Alex Hall, Apr 16 '16 at 23:05
It's not that the rectangle can't end in the last column, it's that it mustn't have any parts sticking out. Read the comments again more carefully. — Alex Hall, Apr 16 '16 at 23:22
@Tony Thanks. As per Alex's correct observation, I've updated the question to show a new image. Just the two rectangles (single width, single height) are valid. — Pyderman, Apr 16 '16 at 23:26

Alex Hall · Accepted Answer · 2016-04-16T23:36:02.507

I don't know numpy, so here's a plain Python solution:

from collections import namedtuple

Rectangle = namedtuple("Rectangle", "top bottom left right")

def find_rectangles(arr):

    # Deeply copy the array so that it can be modified safely
    arr = [row[:] for row in arr]

    rectangles = []

    for top, row in enumerate(arr):
        start = 0

        # Look for rectangles whose top row is here
        while True:
            try:
                left = row.index(0, start)
            except ValueError:
                break

            # Set start to one past the last 0 in the contiguous line of 0s
            try:
                start = row.index(1, left)
            except ValueError:
                start = len(row)

            right = start - 1

            if (  # Width == 1
                  left == right or
                  # There are 0s above
                  top > 0 and not all(arr[top-1][left:right + 1])):
                continue

            bottom = top + 1
            while (bottom < len(arr) and
                   # No extra zeroes on the sides
                   (left == 0 or arr[bottom][left-1]) and
                   (right == len(row) - 1 or arr[bottom][right + 1]) and
                   # All zeroes in the row
                   not any(arr[bottom][left:right + 1])):
                bottom += 1

            # The loop ends when bottom has gone too far, so backtrack
            bottom -= 1

            if (  # Height == 1
                  bottom == top or
                  # There are 0s beneath
                  (bottom < len(arr) - 1 and
                   not all(arr[bottom + 1][left:right+1]))):
                continue

            rectangles.append(Rectangle(top, bottom, left, right))

            # Remove the rectangle so that it doesn't affect future searches
            for i in range(top, bottom+1):
                arr[i][left:right+1] = [1] * (right + 1 - left)

    return rectangles

For the given input, the output is:

[Rectangle(top=2, bottom=3, left=3, right=5),
 Rectangle(top=5, bottom=6, left=3, right=4)]

This is correct because the comments indicate that the 'rectangle' on the right is not to be counted since there is an extra 0 sticking out. I suggest you add more test cases though.

I expect it to be reasonably fast since much of the low-level iteration is done with calls to index and any, so there's decent usage of C code even without the help of numpy.

It's a bit hard to calculate the exact complexity but I think it's pretty good. Probably `O(m*n)`, at least approximately. I've removed the glaring weakness which was repeatedly checking for containment in previously found rectangles. — Alex Hall, Apr 16 '16 at 23:37
From what I understand: when You find `0` You try to spread it as far as possible to the rectangle and then black it out with `1s` - its `O(n * m)`. Linear memory complexity could be an issue with bigger arrays though. — Tony Babarino, Apr 16 '16 at 23:47
If you can hold the entire array in memory, you can probably hold it twice. If it's really an issue you could avoid copying the array and just revert the changes at the end: it's easy to do since all the rectangles are recorded. Or just leave the array modified, depending on the use case it may not matter. — Alex Hall, Apr 16 '16 at 23:56
try this arr = [ [1,1,1,1,1,1], [0,0,1,0,1,1], [0,0,1,0,1,0], [1,1,1,0,1,0], [1,0,0,1,1,1] ] You code doesn't work correctly. — Vision, Sep 15 '18 at 21:53
@Wedoso that's correct for this question, rectangles with width/length 1 are ignored - see the comments. — Alex Hall, Sep 15 '18 at 22:02

Detecting rectangles (sub-arrays of same element value) in 2-d list

2 Answers2

Linked