1

Why does my sklearn.model_selection.train_test_split() returns same samples of X_train, X_test, y_train, y_test each time I run the code, even though I have kept shuffle=True, and I have not manually defined the seed value?

I am printing the samples like this:

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 100, shuffle=True)

print (y_test)
Savannah
  • 45
  • 5
  • You are forcing a specific random state. Remove it and you will get different results – Mohammad Aug 05 '21 at 08:27
  • Ahh man that was foolish of me, thanks! can you please post your comment and answer, and I can tick it as solution. – Savannah Aug 05 '21 at 08:29

1 Answers1

1

The train_test_split random_state controls the state of the sample (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html):

Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls

To get different results, simply remove the parameter.

Mohammad
  • 3,276
  • 2
  • 19
  • 35