1

I have about 50 locations and I want to cluster them spatially with size of cluster fixed. say following few locations are.

lat<-c(17.48693,17.49222,17.51965,17.49359,17.49284,17.47077)
long<-c(78.38945,78.39643,78.37835,78.40079,78.40686,78.35874)

Say i want to cluster them with cluster size ~ 3

Can you please help?

areddy
  • 373
  • 3
  • 7
  • 18
  • You could start with a standard algorithm like K-means or hierarchical clustering and then add some postprocessing to tune the size of clusters. There are some discussions about that [here](http://stats.stackexchange.com/questions/74495/use-hierarchical-clustering-in-r-to-cluster-items-into-fixed-size-clusters) and [here](http://stackoverflow.com/questions/5452576/k-means-algorithm-variation-with-equal-cluster-size). – Duf59 Oct 27 '15 at 07:33
  • If spatial coordinates are the only features, can't you define the clusters manually? 50 locations / 3 locs/cluster = 17 clusters, or groups. Easy to do, much faster than coding but the simplest program. – knb Oct 27 '15 at 12:03

2 Answers2

0

You could try using kmeans, which is part of baseR. Here is a simple code which will target 3 centers:

result <- kmeans(df, 3)
> result
K-means clustering with 3 clusters of sizes 4, 1, 1

Cluster means:
       lat      lng
1 17.49140 78.39838
2 17.47077 78.35874
3 17.51965 78.37835

Clustering vector:
[1] 1 1 3 1 1 2

enter image description here

Keep in mind that there is no guarantee that your data may fit well with kmeans and 3 centers. This run of kmeans led to 4 observations ending up in one cluster, with the other 2 clusters having only 1 observation. If you are unhappy with this run, you can play around a bit until you can converge on something which fits well.

Here is a link to a tutorial which might help.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • thank you sir, for the help. The issue in my problem is I want clusters of fixed size. say every cluster approximately should contain equal number of points. – areddy Oct 27 '15 at 06:41
  • I don't know if `kmeans` allows for this. Why do you have this requirement? – Tim Biegeleisen Oct 27 '15 at 06:42
  • Normally i used the package 'code'(leaderCluster) to cluster the locations but it gave clusters of different size. same as the above – areddy Oct 27 '15 at 06:43
  • Enforcing the same number of observations per cluster may result in a bad fit. Why do you need this? – Tim Biegeleisen Oct 27 '15 at 06:44
  • 1
    I need this because an agent need to cover some area to visit plots for sale. He can only visit, say 10 houses only per day. so in a week he can cover 50 houses. so for every agent i need to define those 50 different place to visit. hence i need clusters of size 50 in a city – areddy Oct 27 '15 at 06:47
  • Try to come up with an algorithm for how to dissociate a pair from its center. This could get complicated. – Tim Biegeleisen Oct 27 '15 at 06:52
  • kmeans cannot use geographic distance. It minimizes the sum of squares, so you will have *distortion*. – Has QUIT--Anony-Mousse Oct 27 '15 at 15:04
  • @Anony-Mousse My feeling is that kmeans would not be suitable for his use case. So I agree with you. – Tim Biegeleisen Oct 27 '15 at 15:08
0

For tiny data like this,

  • enumerate all admissible options (e.g. all that have 3+3 objects)
  • choose the best

Where you have to define what is the "best" solution.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194