What is "random-state" in sklearn.model_selection.train_test_split example?

Question

Can someone explain me what random_state means in below example?

import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(10).reshape((5, 2)), range(5)


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)

Why is it hard coded to 42?

Does this answer your question? [Random state (Pseudo-random number) in Scikit learn](https://stackoverflow.com/questions/28064634/random-state-pseudo-random-number-in-scikit-learn) — Kim Kern, Oct 26 '20 at 17:15

score 95 · Accepted Answer · answered Mar 07 '18 at 09:04

95

Isn't that obvious? 42 is the Answer to the Ultimate Question of Life, the Universe, and Everything.

On a serious note, random_state simply sets a seed to the random generator, so that your train-test splits are always deterministic. If you don't set a seed, it is different each time.

Relevant documentation:

random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

answered Mar 07 '18 at 09:04

cs95

379,657
97
704
746

22

That first sentence was more than enough. – Danrex Oct 10 '18 at 02:54
1

@cs95 Do I have to generate a new `random_state` for subsequent methods in my code? For example, if I set the random state as 42 for the `train_test_split`, do I set the random state also as 42 for the classifier I will be using on the split data? What about if I want to compare two different classifiers, do I use the same random state for both classifiers? – Pleastry Oct 27 '20 at 13:19
@Turtle I think you are looking to set a global seed so your pipeline is deterministic. This will only make the split deterministic, nothing else. Consider using something like np.random.seed or creating a random state object that is then reused across functions. – cs95 Oct 27 '20 at 18:22
but if you use it in train, test split do you still need to use it when you run each algorithm ? – vanetoj Nov 17 '21 at 19:41
How is the random_state saved? For example does it matter if I run my code on different Colab-Notebooks on different accounts? – Maxl Gemeinderat Jun 02 '22 at 12:38
I think the spec dictates that the seed be deterministic across platforms @MaxlGemeinderat but all bets are off the table if random seed is `None`. – cs95 Jul 23 '22 at 10:25

score 19 · Answer 2 · answered Jan 22 '20 at 12:09

If you don't specify the random_state in the code, then every time you run(execute) your code a new random value is generated and the train and test datasets would have different values each time.

However, if a fixed value is assigned like random_state = 0 or 1 or 42 or any other integer then no matter how many times you execute your code the result would be the same .i.e, same values in train and test datasets.

score 10 · Answer 3 · answered Mar 07 '18 at 09:05

Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.

kishore naidu · Answer 4 · 2020-03-29T17:17:41.677

6

When the Random_state is not defined in the code for every run train data will change and accuracy might change for every run. When the Random_state = " constant integer" is defined then train data will be constant For every run so that it will make easy to debug.

edited Mar 29 '20 at 17:17

answered Mar 29 '20 at 17:12

kishore naidu

61
1
3

score 2 · Answer 5 · answered Dec 01 '20 at 08:23

2

The random state is simply the lot number of the set generated randomly in any operation. We can specify this lot number whenever we want the same set again.

answered Dec 01 '20 at 08:23

OmkarKhilari

62
5

What is "random-state" in sklearn.model_selection.train_test_split example?

5 Answers5

Linked

Related