4

I have an 8x8 matrix:

1 1 0 0 0 0 0 0
1 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1

Code to create it:

examplemat <- matrix(c(1, 1, rep(0, 6), 1, rep(0, 9), 1, 1, rep(0, 17), 1, rep(0, 7), 1, rep(0, 5), 1, rep(0, 11), 1), 8, 8, byrow=T)

The ones have been extracted as coordinates:

onecoords <- which(examplemat == 1, arr.ind=T)

      row col
 [1,]   1   1
 [2,]   2   1
 [3,]   1   2
 [4,]   3   3
 [5,]   3   4
 [6,]   7   4
 [7,]   5   6
 [8,]   6   6
 [9,]   8   8

I need a simple, vectorized approach to clustering the ones into groups of adjacent coordinates. (For the purposes of this task, I would like 4-way up/down/left/right adjacency, but you may also want the option of 8-way adjacency including the diagonals.)

In the example, we would end up with 5 clusters of cells:

      row col clus
 [1,]   1   1    A
 [2,]   2   1    A
 [3,]   1   2    A
 [4,]   3   3    B
 [5,]   3   4    B
 [6,]   7   4    C
 [7,]   5   6    D
 [8,]   6   6    D
 [9,]   8   8    E

The 4-way adjacency check is pretty simple: sum(abs(onecoords[1,] - onecoords[2,])) == 1 But I'm struggling to work out how to vectorize this efficiently.

ajrwhite
  • 7,728
  • 1
  • 11
  • 24
  • 2
    You might find something [in here](http://stackoverflow.com/questions/6518133/clustering-list-for-hclust-function) useful and, hopefully, general enough; e.g. here `cbind(onecoords, clus = cutree(hclust(dist(onecoords, "maximum")), h = 1))` – alexis_laz Jun 21 '16 at 13:48
  • I agree that some use of `dist` or a more general Mahalanobis distance function will help. – Carl Witthoft Jun 21 '16 at 14:06

1 Answers1

3

With thanks to alexis_laz and Carl Witthoft for very helpful comments that enable me to answer this myself.

Four-way adjacency

To obtain clusters based on 4-way adjacency, use hierarchical clustering with single linkage on Manhattan distance:

cbind(onecoords, clus = cutree(hclust(dist(onecoords, "manhattan"), "single"), h = 1))
      row col clus
 [1,]   1   1    1
 [2,]   2   1    1
 [3,]   1   2    1
 [4,]   3   3    2
 [5,]   3   4    2
 [6,]   7   4    3
 [7,]   5   6    4
 [8,]   6   6    4
 [9,]   8   8    5

Eight-way adjacency

To obtain clusters based on 8-way adjacency (including diagonals), use hierarchical clustering with single linkage on maximum distance:

cbind(onecoords, clus = cutree(hclust(dist(onecoords, "maximum"), "single"), h = 1))

Difference between Manhattan and maximum distances

In the example case, these both give the same results, but if you remove the first row, for example, you will see that Manhattan distance produces 6 clusters, while maximum distance produces 5 clusters.

cbind(onecoords[2:9,], clus = cutree(hclust(dist(onecoords[2:9,], "manhattan"), "single"), h = 1))
     row col clus
[1,]   2   1    1
[2,]   1   2    2
[3,]   3   3    3
[4,]   3   4    3
[5,]   7   4    4
[6,]   5   6    5
[7,]   6   6    5
[8,]   8   8    6
cbind(onecoords[2:9,], clus = cutree(hclust(dist(onecoords[2:9,], "maximum"), "single"), h = 1))
     row col clus
[1,]   2   1    1
[2,]   1   2    1
[3,]   3   3    2
[4,]   3   4    2
[5,]   7   4    3
[6,]   5   6    4
[7,]   6   6    4
[8,]   8   8    5
ajrwhite
  • 7,728
  • 1
  • 11
  • 24