Why does my `train_test_split()` returns same samples

Question

Why does my sklearn.model_selection.train_test_split() returns same samples of X_train, X_test, y_train, y_test each time I run the code, even though I have kept shuffle=True, and I have not manually defined the seed value?

I am printing the samples like this:

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 100, shuffle=True)

print (y_test)

You are forcing a specific random state. Remove it and you will get different results — Mohammad, Aug 05 '21 at 08:27
Ahh man that was foolish of me, thanks! can you please post your comment and answer, and I can tick it as solution. — Savannah, Aug 05 '21 at 08:29

score 1 · Accepted Answer · answered Aug 05 '21 at 08:38

The train_test_split random_state controls the state of the sample (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html):

Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls

To get different results, simply remove the parameter.

Why does my `train_test_split()` returns same samples

1 Answers1