0

I am doing my first assignment in Data Science (Masters level) and do not come from a programming background. I have completed a K-Means model on my data (which is a a simple test data set). But now I want to implement bisecting k-means in order to show how this can improve the clustering result. I am coding in R, does anyone have any knowledge on how to code bisecting k-means in R, for someone who is fairly new to the field?

The code I am trying to use is:

bkmeansset <- ml_bisecting_kmeans(x, formula = NULL, k = 3, max_iter = 20, 
seed = NULL, min_divisible_cluster_size = 1, features_col = "features", 
       prediction_col = "prediction", uid = 
       random_string("bisecting_bisecting_kmeans_")) 

I am inputting a test set called "testset" and I am not sure where to but this in the argument of the function. The error message that I am getting is:

Error in UseMethod("ml_bisecting_kmeans") : 
no applicable method for 'ml_bisecting_kmeans' applied to an object of class 
"character"
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Provide references to what exactly you mean by "bisecting k-means." Show the code you tried and describe exactly where you are getting stuck. If you haven't tried anything, then a better starting point is probably your instructor since they likely have something very specific in mind. – MrFlick Nov 27 '18 at 19:46
  • I do apologise, this is my first time using stack overflow. The code I am trying to use is: bkmeansset <- ml_bisecting_kmeans(x, formula = NULL, k = 3, max_iter = 20, seed = NULL, min_divisible_cluster_size = 1, features_col = "features", prediction_col = "prediction", uid = random_string("bisecting_bisecting_kmeans_")) I am inputting a test set called "testset" and I am not sure where to but this in the argument of the function. K-Means sometimes sticks in a local minimum whereas BK-Means gets the global optimal. – Sarah Kuria Nov 27 '18 at 20:12
  • I'm really sorry about the code looks in that comment, I am really struggling to get it into a readable format. – Sarah Kuria Nov 27 '18 at 20:17
  • 2
    Code shouldn't go in to comments. You should edit your original question to include those relevant details. – MrFlick Nov 27 '18 at 20:18
  • Thank you. I have now added it to my original question. – Sarah Kuria Nov 27 '18 at 23:31
  • What is in your `x` variable? The error message makes it sound like a character vector and I don’t see how it would be possible to do kmeans with character values - there no way to take the means of those. – MrFlick Nov 27 '18 at 23:36
  • I didn't know what to put for 'x' variable. I have tried putting in my data set (testset) but it comes up with this error: Error in UseMethod("ml_bisecting_kmeans") : no applicable method for 'ml_bisecting_kmeans' applied to an object of class "data.frame" – Sarah Kuria Nov 28 '18 at 00:37
  • Welcome to SO! I advice you to read [this](https://www.rdocumentation.org/packages/sparklyr/versions/0.9.2/topics/ml_bisecting_kmeans), looking also the example, and caring to look each step, to see how you have to preprocess the data to have them in a way that the function likes. However if you put your dataset here (if you can publish it) with `dput(testset)` and post the result editing the question, it's going to be easier to help you. If it's too big, you can `dput(head(testset,20))` or create some fake data equal to yours. – s__ Nov 28 '18 at 07:48

0 Answers0