0

I have 4 different dataframes of for different city's but with the same variables. I want to make a hierarchical cluster between the dataset of the for city's. I have tried this code to make a hierarchical cluster in R:

hc <- hclust(dist(df))
hcd <- as.dendrogram(hc)

But this code makes a dendrogram of one dataframe. What I want to do is to make a dendrogram between different city's, so I want to cluster the row's between different city's. I have searched a lot at the internet bit I couldn't found anything about it. Does anyone know how to solve this problem?

I have also tried to combine the dataset, but then it makes also clusters with the same city's. I want to make clusters between different city's

An example of my dataset is as follows:

I have 4 different dataframes of for different city's but with the same variables. I want to make a hierarchical cluster between the dataset of the for city's. I have tried this code to make a hierarchical cluster in R:

hc <- hclust(dist(df))
hcd <- as.dendrogram(hc)

But this code makes a dendrogram of one dataframe. What I want to do is to make a dendrogram between different city's, so I want to cluster the row's between different city's. I have searched a lot at the internet bit I couldn't found anything about it. Does anyone know how to solve this problem?

I have also tried to combine the dataset, but then it makes also clusters with the same city's. I want to make clusters between different city's

An example of my dataset is as follows:

      colname_city   col_1   col_2
[1,]  Amsterdam      0.2     0.3
[2,]  Rotterdam      0.3     0.5
[3,]  Den Haag       0.4     0.2
[4,]  Utrecht        0.2     0.1
[5,]  Amsterdam      0.1     0.5
[6,]  Rotterdam      0.2     0.5
[7,]  Rotterdam      0.4     0.4
[8,]  Utrecht        0.5     0.3
[9,]  Utrecht        0.5     0.5
[10,] Den Haag       0.6     0.3
Brock Adams
  • 90,639
  • 22
  • 233
  • 295
  • When asking a question, you should provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can get an idea of what's going on. Why can't you just combine your different data.frames into a single data.frame to do the clustering? – MrFlick May 23 '16 at 20:35
  • 1
    You need to merge dataframes first and then apply hclust – Pankaj Sharma May 23 '16 at 20:36
  • I have also combined the dataset. But when I make a cluster of that it clusters also the same city't, but I want to cluster different city's. – Hakan Yılmaz May 23 '16 at 20:36
  • @PankajSharma I have also tried it but I don't want to have clusters between the rows with te same city name. Therefore I want to try to cluster 4 df's ?? – Hakan Yılmaz May 23 '16 at 20:39
  • What would be the desired outcome?!? – Has QUIT--Anony-Mousse May 23 '16 at 21:47

1 Answers1

1

To cluster each city separately, a subset of the rows containing the data for the given city needs to be selected. The hierarchical clustering is then applied only to the subset. data is your example table.

city_hc <- function(city){
    temp <- data[which(data$colname_city == city),]
    hcd <- as.dendrogram(hclust(dist(temp)))
    return(hcd)
}

To obtain dendrograms for all cities, we cycle through all levels.

hcds <- lapply(levels(data$colname_city), city_hc)
names(hcds) <- levels(data$colname_city)

The result contains a list of all dendrograms.

str(hcds)
# List of 4
#  $ Amsterdam:  ..--[dendrogram w/ 2 branches and 2 members at h = 0.274, midpoint = 0.5]
#   ..  |--leaf "1" 
#   ..  `--leaf "5" 
#  $ Den Haag :  ..--[dendrogram w/ 2 branches and 2 members at h = 0.274, midpoint = 0.5]
#   ..  |--leaf "3" 
#   ..  `--leaf "10" 
#  $ Rotterdam:  ..--[dendrogram w/ 2 branches and 3 members at h = 0.274, midpoint = 0.75]
#   ..  |--leaf "7" 
#   ..  `--[dendrogram w/ 2 branches and 2 members at h = 0.122, midpoint = 0.5]
#   ..    #  |--leaf "2" 
#   ..    #  `--leaf "6" 
#  $ Utrecht  :  ..--[dendrogram w/ 2 branches and 3 members at h = 0.612, midpoint = 0.75]
#   ..  |--leaf "4" 
#   ..  `--[dendrogram w/ 2 branches and 2 members at h = 0.245, midpoint = 0.5]
#   ..    #  |--leaf "8" 
#   ..    #  `--leaf "9" 

# plot a dendrogram
plot(hcds[[3]])

I hope this is what you needed to do.

nya
  • 2,138
  • 15
  • 29