I have a data frame, df
, containing the x
and y
coordinates of a bunch of points. Here's an excerpt:
> tail(df)
x y
1495 0.627174 0.120215
1496 0.616036 0.123623
1497 0.620269 0.122713
1498 0.630231 0.110670
1499 0.611844 0.111593
1500 0.412236 0.933250
I am trying to find out the most appropriate number of clusters. Ultimately the goal is to do this with tens of thousands of these data frames, so the method of choice must be quick and can't be visual. Based on those requirements, it seems like the RWeka package is the way to go.
I managed to successfully load the RWeka package (I had to install Java SE Runtime in my computer first) and also RWeka's package XMeans, and run it:
library("RWeka") # requires Java SE Runtime
WPM("refresh-cache") # Build Weka package metadata cache
WPM("install-package", "XMeans") # Install XMeans package if not previously installed
weka_ctrl <- Weka_control( # Create a Weka control object to specify our parameters
I = 100, # max no iterations overall
M = 100, # max no iterations in the kmeans loop
L = 2, # min no clusters
H = 5, # max no clusters
D = "weka.core.EuclideanDistance", # distance metric
C = 0.4, S = 1)
x_means <- XMeans(df, control = weka_ctrl) # run algorithm on data
This produces exactly the result I want:
XMeans
======
Requested iterations : 100
Iterations performed : 1
Splits prepared : 2
Splits performed : 0
Cutoff factor : 0.4
Percentage of splits accepted
by cutoff factor : 0 %
------
Cutoff factor : 0.4
------
Cluster centers : 2 centers
Cluster 0
0.4197712002617799 0.9346986806282739
Cluster 1
0.616697959239131 0.11564350951086963
Distortion: 30.580934
BIC-Value : 2670.359509
I can assign each point in my data-frame to a cluster by running x_means$class_ids
.
However, I would like to have a way of retrieving the coordinates of the cluster centres. I can see them in the output and write them down manually, but if I am to run tens of thousands of these, I need to be able to have a piece of code that saves them into a variable. I can't seem to subset x_means
by using square brackets, so I don't know what else to do.
Thank you so much in advance for your help!