4

I have an X by Y grid with cells containing 1 if a certain criteria is met or 0 if it is not. Now I want to identify features in the grid where there are at least N contiguous cells containing a 1. Contiguous cells can be adjacent side by side, or adjacent diagonally. I made a picture to illustrate the problem (see link), with N = 5. For clarity I omitted marking the 0s, and they are in the unmarked cells. Red 1s belong to features I want to identify, and black 1s do not. The desired result would be as shown in the picture, but with all the black 1s changed to 0s. I use R, so solutions using that language would be thoroughly appreciated, but I'll happily settle for others. I couldn't find anything in the R libraries (such as rgeos) specifically, but maybe I'm missing something. Any help appreciated, thanks!

Illustration of feature identification problem with N = 5

Here is a small reproducible example created

input.mat <- structure(c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
                         0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
                         1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 
                         0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
                         1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                         0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 
                         0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 
                         1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 
                         1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
                         0L, 1L, 1L, 1L), .Dim = c(15L, 15L), .Dimnames = list(NULL, NULL))

input.mat
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
 [1,]    1    1    0    0    0    0    0    0    0     0     0     0     1     0     0
 [2,]    1    1    0    0    1    1    1    0    0     1     0     0     0     1     0
 [3,]    0    0    1    0    0    0    0    0    0     1     1     0     1     0     1
 [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0     1     0
 [5,]    0    0    0    0    0    0    0    0    0     0     0     1     0     0     0
 [6,]    1    0    0    0    0    0    0    0    0     0     1     0     1     1     0
 [7,]    1    1    0    0    0    0    0    0    0     0     0     1     0     0     0
 [8,]    1    1    0    0    0    0    0    0    0     0     0     0     0     0     0
 [9,]    1    0    0    0    0    1    0    1    0     0     0     1     1     1     0
[10,]    0    0    0    0    0    0    0    0    0     0     0     1     1     1     0
[11,]    0    0    1    0    1    0    0    0    0     0     0     0     0     0     1
[12,]    0    0    0    1    0    0    0    0    0     1     0     0     0     0     0
[13,]    0    0    1    0    1    0    0    0    1     0     0     0     0     0     1
[14,]    0    0    0    0    0    0    0    0    1     0     0     0     0     0     1
[15,]    1    1    1    1    1    0    0    0    1     1     0     0     0     0     1
output.mat <- structure(c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
                          0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 
                          0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 
                          1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 
                          1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
                          0L, 0L, 0L, 0L), .Dim = c(15L, 15L), .Dimnames = list(NULL, NULL))

output.mat
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
 [1,]    1    1    0    0    0    0    0    0    0     0     0     0     1     0     0
 [2,]    1    1    0    0    0    0    0    0    0     0     0     0     0     1     0
 [3,]    0    0    1    0    0    0    0    0    0     0     0     0     1     0     1
 [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0     1     0
 [5,]    0    0    0    0    0    0    0    0    0     0     0     1     0     0     0
 [6,]    1    0    0    0    0    0    0    0    0     0     1     0     1     1     0
 [7,]    1    1    0    0    0    0    0    0    0     0     0     1     0     0     0
 [8,]    1    1    0    0    0    0    0    0    0     0     0     0     0     0     0
 [9,]    1    0    0    0    0    0    0    0    0     0     0     1     1     1     0
[10,]    0    0    0    0    0    0    0    0    0     0     0     1     1     1     0
[11,]    0    0    1    0    1    0    0    0    0     0     0     0     0     0     1
[12,]    0    0    0    1    0    0    0    0    0     1     0     0     0     0     0
[13,]    0    0    1    0    1    0    0    0    1     0     0     0     0     0     0
[14,]    0    0    0    0    0    0    0    0    1     0     0     0     0     0     0
[15,]    1    1    1    1    1    0    0    0    1     1     0     0     0     0     0

Created on 2021-05-27 by the reprex package (v2.0.0)

Henrik
  • 65,555
  • 14
  • 143
  • 159
ZippyP
  • 53
  • 4

3 Answers3

8

Using terra functions:

Convert matrix to raster (rast). Identify patches of 1s, surrounded by zeros (zeroAsNA = TRUE). Consider also diagonal neighbors when defining contiguity (directions = 8). Count number of cells in each patch (freq). Check which patches have a count of < 5. At these indices, set cells to NA. Coerce raster to matrix and check which values are NA. At these indices, set original matrix values to 0.

library(terra)

m = input.mat
p = patches(rast(input.mat), directions = 8, zeroAsNA = TRUE)
p[p %in% which(freq(p)[ , "count"] < 5)] = NA
m[is.na(as.matrix(p, wide = TRUE))] = 0

all.equal(m, output.mat)
# [1] TRUE

Patches in original input.mat (plot(p)):

enter image description here

After removal of patches with < 5 cells:

enter image description here


Related posts: Combining polygons and calculating their area (i.e. number of cells) in R; Obtaining connected components in R

Henrik
  • 65,555
  • 14
  • 143
  • 159
3

With data.table non equi-join to find neighbouring points and igraph:

library(igraph)
library(data.table)

# index of pixels fulfilling criteria
idx <- which(input.mat==1)

# Coordinates of pixels
coord <- data.table(arrayInd(idx,dim(input.mat)))
setnames(coord,c("x","y"))
coord[,c('xmin','xmax','ymin','ymax'):=.(x-1,x+1,y-1,y+1)]

# Find neighbours indices
neighbours <- coord[coord,.(x.x,x.y,i.x,i.y),on=.(x>=xmin,x<=xmax,y>=ymin,y<=ymax)][!(i.x==x.x&i.y==x.y)][
  ,.(start = nrow(input.mat)*(x.y-1)+x.x,
     end   = nrow(input.mat)*(i.y-1)+i.x)]

g <- graph_from_data_frame(neighbours)
g
#> IGRAPH 503ba64 DN-- 53 120 -- 
#> + attr: name (v/c)
#> + edges from 503ba64 (vertex names):
#>  [1] 2  ->1   16 ->1   17 ->1   1  ->2   16 ->2   17 ->2   7  ->6   22 ->6  
#>  [9] 6  ->7   8  ->7   22 ->7   23 ->7   7  ->8   9  ->8   22 ->8   23 ->8  
#> [17] 8  ->9   23 ->9   30 ->15  1  ->16  2  ->16  17 ->16  1  ->17  2  ->17 
#> [25] 16 ->17  33 ->17  6  ->22  7  ->22  8  ->22  23 ->22  7  ->23  8  ->23 
#> [33] 9  ->23  22 ->23  15 ->30  45 ->30  17 ->33  49 ->33  57 ->41  57 ->43 
#> [41] 30 ->45  60 ->45  33 ->49  41 ->57  43 ->57  71 ->57  73 ->57  45 ->60 
#> [49] 75 ->60  77 ->62  57 ->71  57 ->73  60 ->75  62 ->77  92 ->77  77 ->92 
#> [57] 134->133 147->133 133->134 135->134 150->134 134->135 150->135 138->137
#> + ... omitted several edges

# Find clusters
clust <- clusters(g)

# Minimum size
kept <- clust$membership[clust$membership %in% which(clust$csize >= 5)]

idx_kept <- as.numeric(names(kept)) 

M <- input.mat*0
M[idx_kept]<-1
M
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#>  [1,]    1    1    0    0    0    0    0    0    0     0     0     0     1
#>  [2,]    1    1    0    0    0    0    0    0    0     0     0     0     0
#>  [3,]    0    0    1    0    0    0    0    0    0     0     0     0     1
#>  [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0
#>  [5,]    0    0    0    0    0    0    0    0    0     0     0     1     0
#>  [6,]    1    0    0    0    0    0    0    0    0     0     1     0     1
#>  [7,]    1    1    0    0    0    0    0    0    0     0     0     1     0
#>  [8,]    1    1    0    0    0    0    0    0    0     0     0     0     0
#>  [9,]    1    0    0    0    0    0    0    0    0     0     0     1     1
#> [10,]    0    0    0    0    0    0    0    0    0     0     0     1     1
#> [11,]    0    0    1    0    1    0    0    0    0     0     0     0     0
#> [12,]    0    0    0    1    0    0    0    0    0     1     0     0     0
#> [13,]    0    0    1    0    1    0    0    0    1     0     0     0     0
#> [14,]    0    0    0    0    0    0    0    0    1     0     0     0     0
#> [15,]    1    1    1    1    1    0    0    0    1     1     0     0     0
#>       [,14] [,15]
#>  [1,]     0     0
#>  [2,]     1     0
#>  [3,]     0     1
#>  [4,]     1     0
#>  [5,]     0     0
#>  [6,]     1     0
#>  [7,]     0     0
#>  [8,]     0     0
#>  [9,]     1     0
#> [10,]     1     0
#> [11,]     0     1
#> [12,]     0     0
#> [13,]     0     0
#> [14,]     0     0
#> [15,]     0     0

all.equal(output.mat,M)
#[1] TRUE
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • Seems a promising solution, +1! I guess the only concern might be the memory efficiency due to the non-equi join process, but still good to see the use of igraph – ThomasIsCoding Jun 24 '22 at 20:09
2

Here is a base R code for 2D points clustering

# compute distance from point `x` to point set `S`
fdist <- function(x, S) {
  if (length(S) == 0) {
    return(0)
  }
  v <- x - S
  pmax(abs(Re(v)), abs(Im(v)))
}

# assign groups based on distance
fgrp <- function(x, clst) {
  for (k in seq_along(clst)) {
    if (any(fdist(x, clst[[k]]) < 2)) {
      clst[[k]] <- c(clst[[k]], x)
      return(clst)
    }
  }
}

# use complex number represent 2D points
p <- c(which(input.mat == 1, arr.ind = TRUE) %*% c(1, 1i))
# initialize cluster list
clst <- list()
while (length(p) > 0) {
  idxrm <- c()
  for (k in seq_along(p)) {
    clst_new <- fgrp(p[k], clst)
    if (sum(lengths(clst_new)) > sum(lengths(clst))) {
      idxrm <- c(idxrm, k)
      clst <- clst_new
    }
  }
  if (length(idxrm) == 0) {
    clst <- c(clst, list(p[1]))
  } else {
    p <- p[-idxrm]
  }
}

# keep points that follows the contiguous pattern 
N <- 5
Z <- do.call(
  c,
  Filter(
    function(x) length(x) >= N,
    Map(
      unique,
      clst
    )
  )
)

# produce output matrix
output.mat <- input.mat * 0
output.mat[cbind(Re(Z), Im(Z))] <- 1

and you will obtain

> output.mat
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
 [1,]    1    1    0    0    0    0    0    0    0     0     0     0     1
 [2,]    1    1    0    0    0    0    0    0    0     0     0     0     0
 [3,]    0    0    1    0    0    0    0    0    0     0     0     0     1
 [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0
 [5,]    0    0    0    0    0    0    0    0    0     0     0     1     0
 [6,]    1    0    0    0    0    0    0    0    0     0     1     0     1
 [7,]    1    1    0    0    0    0    0    0    0     0     0     1     0
 [8,]    1    1    0    0    0    0    0    0    0     0     0     0     0
 [9,]    1    0    0    0    0    0    0    0    0     0     0     1     1
[10,]    0    0    0    0    0    0    0    0    0     0     0     1     1
[11,]    0    0    1    0    1    0    0    0    0     0     0     0     0
[12,]    0    0    0    1    0    0    0    0    0     1     0     0     0
[13,]    0    0    1    0    1    0    0    0    1     0     0     0     0
[14,]    0    0    0    0    0    0    0    0    1     0     0     0     0
[15,]    1    1    1    1    1    0    0    0    1     1     0     0     0
      [,14] [,15]
 [1,]     0     0
 [2,]     1     0
 [3,]     0     1
 [4,]     1     0
 [5,]     0     0
 [6,]     1     0
 [7,]     0     0
 [8,]     0     0
 [9,]     1     0
[10,]     1     0
[11,]     0     1
[12,]     0     0
[13,]     0     0
[14,]     0     0
[15,]     0     0

Ideas

  • Find the positions of 1s, i.e., row-column indices
  • For each point position, we check if it falls within any existing cluster. If yes, the point is assigned to that cluster. Otherwise, a new cluster is created with this point
  • The process is terminated when all points are checked.
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Thanks so much for this TIC and AnilGoya, and sorry for my delay in replying to your suggestions/requests. I will test this out over the coming days and get back with any issues. For completeness, I should state that in my real-world problem N=225, and the grid is 396*480 cells (though the vast majority contain 0s). And I have to repeat the search for probably tens if not hundreds of these size grids, because each grid corresponds to a single hour of data, and I'm searching over long timespans. So it will be interesting to see how fast this code can churn through all that. – ZippyP May 30 '21 at 00:34
  • Hi again TIC. Just an update that the code seems to be working fine (took time to do my analysis, hence the delay in replying, sorry). For grids with mainly zeros it's very quick (<0.05s), for grids with more complicated patterns of 1s it takes a few 10s of seconds. One question though: if some grid cells contain NA/NaN instead of zero, does the code still work? I actually tested it, and it does still seem to still work -- but is this by accident or design? I tired to understand the code line by line, but it's difficult for a non computer scientist like me! – ZippyP Jun 28 '21 at 03:31
  • @ZippyP I think it should fit for matrix with `NA/NAN`, since we use `which` to find the positions of `1`s. I added some explanations and hope it could help you understand the idea easily. – ThomasIsCoding Jun 28 '21 at 08:44
  • See `raster::clump`, e.g. [Combining polygons and calculating their area (i.e. number of cells) in R](https://stackoverflow.com/questions/20659186/combining-polygons-and-calculating-their-area-i-e-number-of-cells-in-r/20663885#20663885), [et al.](https://stackoverflow.com/search?tab=votes&q=%5br%5d%20raster%20clump) – Henrik Jun 24 '22 at 06:15
  • Or possibly faster with `terra::patches` ("`terra` replaces the `raster` package [...], but `terra` is simpler, faster and can do more.") – Henrik Jun 24 '22 at 06:34
  • @Henrik Nice proposal! If possible, could you post an answer? Probably more audiences are also interested in the implementation. – ThomasIsCoding Jun 24 '22 at 20:10