1

We have built models in R for Clustering. We now want the equation of the model to be deployed for the new customers whom we want to Cluster. In SAS, the Cluster node used to provide a Clustering SAS code where we only had to to plug the new input variables. Is there a way to do that in R? How can we export the Cluster equation?

An example of the same is as below using the standard iris dataset.

irisnew <- iris
library("cluster", lib.loc="~/R/win-library/3.2")
(kc <- kmeans(irisnew, 3)) 

K-means clustering with 3 clusters of sizes 62, 38, 50

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.901613    2.748387     4.393548    1.433871
2     6.850000    3.073684     5.742105    2.071053
3     5.006000    3.428000     1.462000    0.246000

Clustering vector:
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
 [39] 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 [77] 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 2 2 2 2 2 2 1
[115] 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 1 2 2 1

Within cluster sum of squares by cluster:
[1] 39.82097 23.87947 15.15100
 (between_SS / total_SS =  88.4 %)

Now that the Cluster is defined, i have a new dataset for petals that I need to classify according to the above clustering rules. My Question is how do i export the rules do that? Typically the rules are defined as

x = a1 * Sepal.Length + a2 * Sepal.Width +a3 * Petal.Length + a4 * Petal.Width + b
Then if x between z1 and z2 then Cluster1
else if x between z3 and z4 then Cluster2
else if x between z5 and z6 then Cluster3
else Cluster4

Thanks, Manish

myloginid
  • 1,463
  • 2
  • 22
  • 37
  • 2
    What functions have you used to cluster data in R? A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be nice. – MrFlick May 04 '15 at 05:24
  • Your' updated example doesn't work. You define `irisnew` but then run kmeans on `newiris`. I assume you intended to filter out the categorical variable. Plus, for reproducibility, it doesn't help to specify a `lib.loc=` since that may differ via OS and R version. I did vote to re-open assuming you can get the example to work (but it does take more votes than just one to re-open). – MrFlick May 06 '15 at 04:32
  • Corrected the `irisnew` and `newiris` typo. I had made couple of datasets from the base data. My question is simple - I have say X Sets of flowers clustered on basis of some algo. I want to apply the same clustering rules on a new flower that comes in. (in real life they are existing customers and new customers). I can implement that using the clustering code in SAS as i described above with coefficients and if then else... I simply want to do that in R. In SAS what "Scoring" means that we apply a built model on new raw data. I want to get and execute the Score code in R – myloginid May 06 '15 at 07:12
  • Does that sample actually work for you? I get an error: "Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)". If you are doing kmeans clustering, you just need to extract the cluster centers (`kc$centers`) and then compare the distance for each new point to each of these centers and choose the closest cluster center. – MrFlick May 06 '15 at 13:51
  • The Scoring Part.. `x = a1 * Sepal.Length......` works in SAS. Whether its Clustering, Regression, Logistic Regression, etc etc.. Thats the way that we get the existing model to run on a new dataset and get the cluster name / regression probability etc etc.. – myloginid May 07 '15 at 03:44
  • Got the answer for Regression kind of models - Use Predict() Function. For Clustering got a Similar Question - http://stackoverflow.com/questions/8112169/predict-in-clustering – myloginid May 07 '15 at 04:37

1 Answers1

2

For Generic Models Use - predict.glm(glm.model, newdata = newdf))

For clustering Use - Simple approach to assigning clusters for new data after k-means clustering

Community
  • 1
  • 1
myloginid
  • 1,463
  • 2
  • 22
  • 37