Find the ideal cluster

Question

So, I and some other colleagues developed a hierarchical clustering algorithm to basically find the main clusters involving agricultural industries according to a particular city (e.g. London city).. We structured this algorithm in R. It is working perfectly. So, according to our filters that we inserted in the algorithm, we were able to generate 6 clustering scenarios to London city. For example, the first scenario generated 2 clusters, the second scenario 5 clusters, and so on. I would therefore like some help on how I can choose the most appropriate one. I saw that there are some packages that help in this process, like pvclust, but I couldn't use it for my case. I am inserting a brief executable code below to show the essence of what I want.

Any help is welcome! If you know how to use using another package, feel free to describe.

Best Regards.

library(rdist)
library(geosphere)
library(fpc)
 
 
df<-structure(list(Industries = c(1,2,3,4,5,6), 
+                    Latitude = c(-23.8, -23.8, -23.9, -23.7, -23.7,-23.7), 
+                    Longitude = c(-49.5, -49.6, -49.7, -49.8, -49.6,-49.9), 
+                    Waste = c(526, 350, 526, 469, 534, 346)), class = "data.frame", row.names = c(NA, -6L))
 
df1<-df
 
#clusters
coordinates<-df[c("Latitude","Longitude")]
d<-as.dist(distm(coordinates[,2:1]))
fit.average<-hclust(d,method="average") 
 
clusters<-cutree(fit.average, k=2) 
df$cluster <- clusters 
> df
  Industries Latitude Longitude Waste cluster
1          1    -23.8     -49.5   526       1
2          2    -23.8     -49.6   350       1
3          3    -23.9     -49.7   526       1
4          4    -23.7     -49.8   469       2
5          5    -23.7     -49.6   534       1
6          6    -23.7     -49.9   346       2
> 
clusters1<-cutree(fit.average, k=5) 
df1$cluster <- clusters1
> df1
  Industries Latitude Longitude Waste cluster
1          1    -23.8     -49.5   526       1
2          2    -23.8     -49.6   350       1
3          3    -23.9     -49.7   526       2
4          4    -23.7     -49.8   469       3
5          5    -23.7     -49.6   534       4
6          6    -23.7     -49.9   346       5
>

Look at the [Cluster Analysis Task View](https://cran.r-project.org/web/views/Cluster.html), particularly section Additional Functionality. The package `clValid` may have what you want. — dcarlson, Dec 12 '20 at 22:41

score 1 · Answer 1 · answered Feb 15 '21 at 06:12

1

Maybe try something like this (note I'm not sure of this approaches' mathematical rigour):

library(tidyverse)
library(geosphere)


clustered_df <- 
  df %>%
  arrange(Latitude, Longitude) %>%
  mutate(
    dist_diff = c(0, geosphere::distVincentyEllipsoid(cbind(.$Latitude, .$Longitude))),
    separate_clust = dist_diff > median(dist_diff[-1]),
    cluster_no = 1 + cumsum(separate_clust)
  ) %>% 
  select(Industries, Longitude, Latitude, Waste, cluster_no))

library(leaflet)

leaflet(clustered_df) %>% 
  addTiles() %>%
  addAwesomeMarkers(lat=~Latitude, lng = ~Longitude, label=~as.character(cluster_no))

answered Feb 15 '21 at 06:12

hello_friend

5,682
1
11
15

Hi @hello_friend, any tip for this question: https://stackoverflow.com/questions/68854121/insert-color-into-google-maps-in-r – Antonio Aug 20 '21 at 01:34
@Jose no, sorry. – hello_friend Aug 20 '21 at 01:58

Find the ideal cluster

1 Answers1