I'm working on train_test_split
in sklearn
and I just can't understand the random_state
parameter . what is its function exactly and why we use it .
Please provide a necessary example.
Thanks in advance.
I'm working on train_test_split
in sklearn
and I just can't understand the random_state
parameter . what is its function exactly and why we use it .
Please provide a necessary example.
Thanks in advance.
random_state
parameter in train_test_split
helps you to reproduce the same result everytime you run that code.
Random state ensures that the splits that you generate are reproducible. Scikit-learn uses random permutations to generate the splits. The random state that you provide is used as a seed to the random number generator. This ensures that the random numbers are generated in the same order.
Without using random_state parameter
from sklearn.model_selection import train_test_split
a = [1,5,6,7,8,6]
b = [2,3,5,2,1,4]
x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25)
print(x1)
# output: [1, 6, 8, 7]
## run the code again
x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25)
print(x1)
# output: [6, 8, 6, 7]
The values will change every time you run the code.
Using random_state parameter
x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25, random_state=42)
print(x1)
# output: [6, 6, 8, 7]
## run the code again
x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25, random_state=42)
print(x1)
# output: [6, 6, 8, 7]
As you can see same values have reproduced and it will create same split everytime you will run the code.