Extract sub-matrices from binary matrix in R

Question

Say binary matrix m:

      # [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 # [1,]    0    0    0    0    0    0    0    0    0
 # [2,]    0    0    0    0    0    0    0    0    0
 # [3,]    0    0    0    1    1    1    1    0    0
 # [4,]    0    0    0    1    1    1    1    0    0
 # [5,]    0    0    0    1    1    1    1    0    0
 # [6,]    0    0    0    0    0    0    0    0    0
 # [7,]    0    1    1    0    0    0    0    1    1
 # [8,]    0    1    1    0    1    1    0    1    1
 # [9,]    0    0    0    0    1    1    0    1    1
# [10,]    0    0    0    0    1    1    0    0    0

m <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 
0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 
1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 
0, 0, 0, 0, 0, 0, 1, 1, 1, 0), .Dim = c(10L, 9L))

How we can extract those 1-valued sub-matrices? e.g.

m[7:9,8:9]

#     [,1] [,2]
#[1,]    1    1
#[2,]    1    1
#[3,]    1    1

The point is that I want to extract them algorithmtically not indexing them explicitly like m[7:9,8:9].

The input is a binary matrix
List of sub-matrices as output (so list of four matrices of dim 3*4, 2*2, 3*2 and 3*2)
Sub-matrices are 1-valued rectangular
The border of the sub-matrices are secured with zeros.

You seem to have found the answer yourself … What is your desired input and output like? — Konrad Rudolph, Oct 26 '16 at 13:18
@KonradRudolph Input a binary matrix, output a list of `1`-valued sub-matrices. I do not want to index them explicitly but rather extract them algorithmically. — 989, Oct 26 '16 at 13:19
So in other words for your example the output would be a list of four matrices? — Konrad Rudolph, Oct 26 '16 at 13:21
@KonradRudolph yes, four matrices of dim `3*4`, `2*2`, `3*2` and `3*2` — 989, Oct 26 '16 at 13:22
Are the submatrices guaranteed to be rectangular? Or would you have to account for something like `m2 <- m; m2[9,3] <- 1; m2`? — nrussell, Oct 26 '16 at 13:25
But if all matrices are guaranteed to be reactangular, then `m[7:8, 2:3]` would not be recognized as `ncol(m[7:8, 2:3]) == nrow(m[7:8, 2:3])`, see: http://mathworld.wolfram.com/RectangularMatrix.html — , Oct 26 '16 at 13:35
Actually, extending @nrussell's example, what is more ambiguous is what happens when `m2 = m; m2[2, 6] = m2[2, 7] = 1`. Which submatrix should be identified -- the tall one (2, 6) -- (5, 7), the wide one (3, 4) -- (5, 7), or both? — tchakravarty, Oct 26 '16 at 13:51
@tchakravarty the border of the sub-matrices are secured with zeros. So `m2` cannot be the case. — 989, Oct 26 '16 at 13:58
To avoid confusion, I would say that the *internal* borders of the sub-matrices are secured by zeros. It isn't the case that these sub-matrices are completely surrounded by zeros. — John Coleman, Oct 26 '16 at 14:11
A row-by-row approach could work. With each new row, some previous rectangles are permanently secured, some grow, some are eliminated (by violation of the secured-by-zeros condition) and some new ones spawn. By the time you process the final row -- you are done. Start by writing a function which does what you want in the 1-D case. — John Coleman, Oct 26 '16 at 14:20

score 8 · Accepted Answer · answered Oct 26 '16 at 15:04

I'd treat it as a spatial problem where you have a raster and want to detect regions of connected cells.

library(raster)
r <- raster(m)

library(igraph)
rc <- clump(r)

plot(rc, col = rainbow(rc@data@max))

m1 <- as.matrix(rc)

lapply(seq_len(rc@data@max), function(x) {
  inds <- which(m1 == x, arr.ind = TRUE)
  nrow <- diff(range(inds[, "row"])) + 1
  ncol <- diff(range(inds[, "col"])) + 1
  matrix(1, ncol = ncol, nrow = nrow)
})
#[[1]]
#     [,1] [,2] [,3] [,4]
#[1,]    1    1    1    1
#[2,]    1    1    1    1
#[3,]    1    1    1    1
#
#[[2]]
#     [,1] [,2]
#[1,]    1    1
#[2,]    1    1
#
#[[3]]
#     [,1] [,2]
#[1,]    1    1
#[2,]    1    1
#[3,]    1    1
#
#[[4]]
#     [,1] [,2]
#[1,]    1    1
#[2,]    1    1
#[3,]    1    1

G. Grothendieck · Answer 2 · 2016-10-26T21:34:21.410

Use focal in the raster package with an appropriate weighting matrix w. It. convolves w with m giving a matrix the same dimensions as m with the value of big at each upper left corner and other values elsewhere so comparing it to big gives a logical matrix which is TRUE at upper left corners of rectangles. Using which we get rc which has one row per rectange and two columns representing the i and j coordinates of the upper left of that rectangle. The Map call iterates over the upper left coordinates invoking genmap on each. genmap uses rle (as defined in the rl function) to find the length of the run of ones in each coordinate direction and returns a matrix of ones having those dimensions.

library(raster)

big <- 100
r <- raster(m)
w <- matrix(0, 3, 3); w[1:2, 1:2] <- 1; w[2, 2] <- big
rc <- which(as.matrix(focal(r, w, pad = TRUE, padValue = 0)) == big, arr = TRUE)

rl <- function(x) rle(x)$lengths[1]
genmat <- function(i, j) matrix(1, rl(m[i:nrow(m), j]), rl(m[i, j:ncol(m)]))
Map(genmat, rc[, 1], rc[, 2])

giving:

[[1]]
     [,1] [,2]
[1,]    1    1
[2,]    1    1

[[2]]
     [,1] [,2] [,3] [,4]
[1,]    1    1    1    1
[2,]    1    1    1    1
[3,]    1    1    1    1

[[3]]
     [,1] [,2]
[1,]    1    1
[2,]    1    1
[3,]    1    1

[[4]]
     [,1] [,2]
[1,]    1    1
[2,]    1    1
[3,]    1    1

Updates Simplified code.

score 3 · Answer 3 · edited May 23 '17 at 11:45

A rather long-winded answer, but you can do this via image labeling as I did in this SO answer. This will extend nicely to non-rectangular blobs of 1's.

find.contiguous <- function(img, x, bg) {
  ## we need to deal with a single (row,col) matrix index
  ## versus a collection of them in a two column matrix separately.
  if (length(x) > 2) {
    lbl <- img[x][1]
    img[x] <- bg
    xc <- x[,1]
    yc <- x[,2]
  } else {
    lbl <- img[x[1],x[2]]
    img[x[1],x[2]] <- bg
    xc <- x[1]
    yc <- x[2]
  }    
  ## find all neighbors of x
  xmin <- ifelse((xc-1) < 1, 1, (xc-1))
  xmax <- ifelse((xc+1) > nrow(img), nrow(img), (xc+1))
  ymin <- ifelse((yc-1) < 1, 1, (yc-1))
  ymax <- ifelse((yc+1) > ncol(img), ncol(img), (yc+1))
  ## find all neighbors of x
  x <- rbind(cbind(xmin, ymin),
             cbind(xc  , ymin),
             cbind(xmax, ymin),
             cbind(xmin, yc),
             cbind(xmax, yc),
             cbind(xmin, ymax),
             cbind(xc  , ymax),
             cbind(xmax, ymax))
  ## that have the same label as the original x
  x <- x[img[x] == lbl,]
  ## if there is none, we stop and return the updated image
  if (length(x)==0) return(img);
  ## otherwise, we call this function recursively
  find.contiguous(img,x,bg)
}

find.contiguous is a recursive function in which for each call it receives:

A working copy of the image img.
A collection of pixel (matrix) indices x (row,col) that belong to an object in the image img.
The background value bg

find.contiguous then proceeds to:

Set all pixels at x in img to the bg color. This marks that we have visited the pixels.
Find all neighboring pixels of x that have the same label (value) as that in x. This grows the region of the same object. Note that since x is not necessarily a single pixel, x grows geometrically so that, in fact, this function is no slouch.
If there are no more neighbors belonging to the same object, we return the updated image; otherwise, we make the recursive call.

Starting from a single pixel that correspond to an object, a call to find.contiguous will grow the region to include all the object's pixels and return an updated image where the object is replaced by the background. This process can then be repeated in a loop until there are no more objects in the image, hence the ability to extract all sub-matrices of 1's.

With your data:

m <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 
                 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 
                 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 
                 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 
                 0, 0, 0, 0, 0, 0, 1, 1, 1, 0), .Dim = c(10L, 9L))
## make a copy to img which will be converted to all-zeros in the process 
## as matrices of 1's are extracted by the process
img <- m
## get all pixel coordinates that are objects
x <- which(img==1, arr.ind=TRUE)
## loop until there are no more pixels that are objects
##the output is in the list out
count <- 0
out <- list()
while (length(x) > 0) {
  ## choose a single (e.g., first) pixel location. This belongs to the current
  ## object that we will grow and remove from the image using find.contiguous
  if (length(x) > 2) {
    x1 <- x[1,]
  }
  ## make the call to remove the object from img
  img <- find.contiguous(img, x1, 0)
  ## find the remaining pixel locations belonging to objects
  xnew <- which(img==1, arr.ind=TRUE)
  count <- count + 1
  ## extract the indices for the 1's found by diffing new with x 
  out.ind <- x[!(x[,1] %in% xnew[,1] & x[,2] %in% xnew[,2]),]
  ## set it as a matrix in the output
  out[[count]] <- matrix(m[out.ind],nrow=length(unique(out.ind[,1])),ncol=length(unique(out.ind[,2])))
  x <- xnew
}

Your output is the list out:

print(out)
##[[1]]
##     [,1] [,2]
##[1,]    1    1
##[2,]    1    1
##
##[[2]]
##     [,1] [,2] [,3] [,4]
##[1,]    1    1    1    1
##[2,]    1    1    1    1
##[3,]    1    1    1    1
##
##[[3]]
##     [,1] [,2]
##[1,]    1    1
##[2,]    1    1
##[3,]    1    1
##
##[[4]]
##     [,1] [,2]
##[1,]    1    1
##[2,]    1    1
##[3,]    1    1

Note that you can just as easily output the locations of the extracted 1's from out.ind:

Extract sub-matrices from binary matrix in R

3 Answers3

Linked