2

I have a matrix of 1's and 0's. What I would like to do is group the cells that have 1's into clusters and count the number of clusters that exist in the matrix as well as the size of these clusters.

If n number (in this case at least 4 cells with the value 1 near each other) of 1's are near each other (either immediately up, down, left or right from each other then consider them a single cluster and output the number of clusters and their size.

For example the matrix looks like this:

> m 

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    0    0    0    0    0    0    0    0     0
[2,]    1    1    1    0    0    0    0    0    0     0
[3,]    0    0    1    0    0    0    0    0    0     0
[4,]    0    0    1    0    0    0    0    0    0     0
[5,]    0    0    1    0    0    0    0    1    1     0
[6,]    0    0    0    0    0    0    0    0    1     1

the number of clusters this matrix has is 2 clusters. One cluster of 7 1's and another cluster of 4 1's. I have been having quite a bit of trouble trying to get this to work and can't seem to figure it out.

The output can be something simple like this:

> output
cluster  size
     1      7
     2      4
user3141121
  • 480
  • 3
  • 8
  • 17

1 Answers1

3

You could use the function ConnCompLabel from the package SDMTools to label the connected components in the binary matrix:

R> ConnCompLabel(m)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    0    0    0    0    0    0    0    0     0
[2,]    1    1    1    0    0    0    0    0    0     0
[3,]    0    0    1    0    0    0    0    0    0     0
[4,]    0    0    1    0    0    0    0    0    0     0
[5,]    0    0    1    0    0    0    0    2    2     0
[6,]    0    0    0    0    0    0    0    0    2     2

R> tab <- table(ConnCompLabel(m))[-1]
R> tab[tab >= 4]

1 2 
7 4 
rcs
  • 67,191
  • 22
  • 172
  • 153
  • I can't seem to find a way to get ConnCompLabel to allow me to define the cluster size I want. What I mean is if I want clusters of a certain size for example size 3 instead of selecting the largest cluster path every time. Any ideas? – user3141121 Sep 09 '14 at 16:42