Random state significance in sklearn

Question

I'm working on train_test_split in sklearn and I just can't understand the random_state parameter . what is its function exactly and why we use it .

Please provide a necessary example.

Thanks in advance.

Possible duplicate of [random\_state parameter in sklearn's train\_test\_split](https://stackoverflow.com/questions/52908885/random-state-parameter-in-sklearns-train-test-split) — Tim, Nov 06 '18 at 11:29
@Akshay's answer below is great. Also, in case you're not familiar with the concept of (psuedo) Random Number Generation, I suggest you peek at [this](https://en.m.wikipedia.org/wiki/Pseudorandom_number_generator) and [this](https://en.m.wikipedia.org/wiki/Random_seed) one, from Wikipedia. — OmerB, Nov 06 '18 at 11:33

Sociopath · Accepted Answer · 2018-11-06T11:23:36.653

random_state parameter in train_test_split helps you to reproduce the same result everytime you run that code.

Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.

Without using random_state parameter

from sklearn.model_selection import train_test_split

a = [1,5,6,7,8,6]
b = [2,3,5,2,1,4]

x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25)

print(x1)
# output: [1, 6, 8, 7]

## run the code again

x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25)

print(x1)
# output: [6, 8, 6, 7]

The values will change every time you run the code.

Using random_state parameter

x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25, random_state=42)

print(x1)
# output: [6, 6, 8, 7]

## run the code again
x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25, random_state=42)

print(x1)
# output: [6, 6, 8, 7]

As you can see same values have reproduced and it will create same split everytime you will run the code.

Random state significance in sklearn

1 Answers1