How to determine what number to use with set.seed() while using the elbow method in R

Asked Jun 05 '19 at 08:50

Active Jun 05 '19 at 08:53

Viewed 131 times

I have a dataset of

customer | income(k) | spend (k)
  value      value       value

The data set has 40 entries. I am using the elbow method to try and figure out how many clusters to use. My question is how do I determine what number to use with set.seed()?

code is below:

set.seed(?)
wcss = vector()
for (i in 1:10) wcss[i] = sum(kmeans(new_dataset, i)$withinss)
plot(1:10,
      wcss,
      type = 'b',
      main = paste('The Elbow Method'),
      xlab = 'Number of clusters',
      ylab = 'WCSS')

edited Jun 05 '19 at 08:53

jogo

12,469
11
37
42

asked Jun 05 '19 at 08:50

kanataki

1

I don't think seed has any effect on the result in R(apparently does in python). It's mainly for reproducibility and you can safely(I think) choose any seed. – NelsonGon Jun 05 '19 at 08:52
1

`set.seed()` is used to achieve repeatability when using functions that generate random values. The number you choose should not have any relevant impact on your results. – boski Jun 05 '19 at 08:53
Thanks, I was just worried picking randomly might be affecting the results. I had seen some comments of the same in a Python forum somewhere. Thanks for the clarification – kanataki Jun 05 '19 at 08:55
1

@NelsonGon The random seed fundamentally fulfils the same purpose in R and Python and, depending on which generator you’re using, may impact the output (in particular when the seed has insufficient information to seed the entire state of the generator). – Konrad Rudolph Jun 05 '19 at 09:39

How to determine what number to use with set.seed() while using the elbow method in R

0 Answers0