0

I have a dataset of

customer | income(k) | spend (k)
  value      value       value

The data set has 40 entries. I am using the elbow method to try and figure out how many clusters to use. My question is how do I determine what number to use with set.seed()?

code is below:

set.seed(?)
wcss = vector()
for (i in 1:10) wcss[i] = sum(kmeans(new_dataset, i)$withinss)
plot(1:10,
      wcss,
      type = 'b',
      main = paste('The Elbow Method'),
      xlab = 'Number of clusters',
      ylab = 'WCSS')
jogo
  • 12,469
  • 11
  • 37
  • 42
kanataki
  • 433
  • 2
  • 21
  • 1
    I don't think seed has any effect on the result in R(apparently does in python). It's mainly for reproducibility and you can safely(I think) choose any seed. – NelsonGon Jun 05 '19 at 08:52
  • 1
    `set.seed()` is used to achieve repeatability when using functions that generate random values. The number you choose should not have any relevant impact on your results. – boski Jun 05 '19 at 08:53
  • Thanks, I was just worried picking randomly might be affecting the results. I had seen some comments of the same in a Python forum somewhere. Thanks for the clarification – kanataki Jun 05 '19 at 08:55
  • 1
    @NelsonGon The random seed fundamentally fulfils the same purpose in R and Python and, depending on which generator you’re using, may impact the output (in particular when the seed has insufficient information to seed the entire state of the generator). – Konrad Rudolph Jun 05 '19 at 09:39

0 Answers0