-2

X_train, X_test, y_train, y_test = train_test_split (X, y, test_size=0.20, random_state=0)

In above code, random_state is used 0. Why we are not using 1?

Manas Kumar
  • 2,411
  • 3
  • 16
  • 23
  • 1
    possible duplicate of https://stackoverflow.com/questions/42191717/python-random-state-in-splitting-dataset/42197534 and https://stackoverflow.com/questions/28064634/random-state-pseudo-random-numberin-scikit-learn – gireesh4manu Jan 19 '19 at 05:56
  • the value of random state does not significantly impact the predictions (very negligible difference). It is just provided so as to reproduce the results again, if required, in future or on a different system/environment. It is just a seed. So if you use random_state=50 then after 7 days use the same value of random_state=50 you will get the exact same split output (even on a different env/system). – Ashu Grover Jan 19 '19 at 05:59
  • 1
    Possible duplicate of [Python random state in splitting dataset](https://stackoverflow.com/questions/42191717/python-random-state-in-splitting-dataset) – desertnaut Jan 19 '19 at 15:12

1 Answers1

2

Neither 0 or 1 for random_state have any meaning, this parameter controls the seed used by the random number generator, so setting to any value will mean that the split is random, but it will be exactly the same result for each call.

This is generally used for reproducibility, but generally you should't rely on the random_state to be a particular value.

If you set random_state to None it will always have a different random behavior each time you call train_test_split.

Dr. Snoopy
  • 55,122
  • 7
  • 121
  • 140