13

I have a (symmetric) matrix M that represents the distance between each pair of nodes. For example,

    A   B   C   D   E   F   G   H   I   J   K   L
A   0  20  20  20  40  60  60  60 100 120 120 120
B  20   0  20  20  60  80  80  80 120 140 140 140
C  20  20   0  20  60  80  80  80 120 140 140 140
D  20  20  20   0  60  80  80  80 120 140 140 140
E  40  60  60  60   0  20  20  20  60  80  80  80
F  60  80  80  80  20   0  20  20  40  60  60  60
G  60  80  80  80  20  20   0  20  60  80  80  80
H  60  80  80  80  20  20  20   0  60  80  80  80
I 100 120 120 120  60  40  60  60   0  20  20  20
J 120 140 140 140  80  60  80  80  20   0  20  20
K 120 140 140 140  80  60  80  80  20  20   0  20
L 120 140 140 140  80  60  80  80  20  20  20   0

Is there any method to extract clusters from M (if needed, the number of clusters can be fixed), such that each cluster contains nodes with small distances between them. In the example, the clusters would be (A, B, C, D), (E, F, G, H) and (I, J, K, L).

Thanks a lot :)

yassin
  • 6,529
  • 7
  • 34
  • 39

3 Answers3

9

Hierarchical clustering works directly with the distance matrix instead of the actual observations. If you know the number of clusters, you will already know your stopping criterion (stop when there are k clusters). The main trick here will be to choose an appropriate linkage method. Also, this paper(pdf) gives an excellent overview of all kinds of clustering methods.

Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • I've already tried UPGMA but the resulting clusters are very bad. Any ideas? – yassin Sep 16 '10 at 09:14
  • 1
    If I interpret your distance matrix correctly, your clusters are very well separated. In that case, single and complete linkage should work well. You may want to try and post this on http://stats.stackexchange.com, there are people that are more specialized on such topics. – Björn Pollex Sep 16 '10 at 09:20
3

One more possible way is using Partitioning Around Medoids which often called K-Medoids. If you look at R-clustering package you will see pam function which recieves distance matrix as input data.

NtsDK
  • 951
  • 1
  • 9
  • 19
0

Well, It is possible to perform K-means clustering on a given similarity matrix, at first you need to center the matrix and then take the eigenvalues of the matrix. The final and the most important step is multiplying the first two set of eigenvectors to the square root of diagonals of the eigenvalues to get the vectors and then move on with K-means . Below the code shows how to do it. You can change similarity matrix. fpdist is the similarity matrix.

mds.tau <- function(H)
{
  n <- nrow(H)
   P <- diag(n) - 1/n
   return(-0.5 * P %*% H %*% P)
  }
  B<-mds.tau(fpdist)
  eig <- eigen(B, symmetric = TRUE)
  v <- eig$values[1:2]
  #convert negative values to 0.
 v[v < 0] <- 0
X <- eig$vectors[, 1:2] %*% diag(sqrt(v))
library(vegan)
km <- kmeans(X,centers= 5, iter.max=1000, nstart=10000) .
#embedding using MDS
cmd<-cmdscale(fpdist)
user4959
  • 455
  • 1
  • 5
  • 8