I have the following simple problem. I have a distance matrix for a number of nodes, and I would like to get a list of subsets of this nodes such that within each subset, every two nodes are at a minimum distance dmin. That is, initially each two nodes are connected by an edge that has an associated value. I want to remove every edge for which the value is smaller than dmin, and list all the resulting disconnected graphs.
Essentially I want to get clusters of data points that are really close to each other, not with a clustering algorithm, but instead by using a threshold value for the distance.
My question is, naturally, how can I accomplish it in R. Consider the following matrix m:
a b c d
a 1.0 0.9 0.2 0.3
b 0.9 1.0 0.4 0.1
c 0.2 0.4 1.0 0.7
d 0.3 0.1 0.7 1.0
There are four nodes (a, b, c, d). I search a function or package that given that matrix (which is in fact 1 - distance matrix) and a threshold dmin, for example dmin <- 0.5
, would produce two sets: {a,b}
and {c,d}
. A very inefficient way of accomplishing it is as follows:
clusters <- list()
nodes <- colnames( m )
dmin <- 0.5
# loop over nodes
for( n in nodes ) {
found <- FALSE
# check whether a node can be associated to one of the existing
# clusters
for( c in names( clusters ) ) {
if( any( m[ n, clusters[[c]] ] > 0.5 ) ) {
clusters[[c]] <- c( clusters[[c]], n )
found <- TRUE
next
}
}
# no luck? create a new cluster for that node
if( ! found )
clusters[[n]] <- c( n )
}
The result would be
> clusters
$a
[1] "a" "b"
$c
[1] "c" "d"