0

I'm kinda new in the area of ML. There is something I wonder.. when I use 'random_state=10' the variables remain same and nothing changes also it doesn't effect the accuracy of the model.. everythings fine until now.. but when I don't use it, the variables change and it changes the accuracy of the model, the variables are different now but they're still in the same data frame, I thought the accuracy still would be same.. is that how the things work in ML? or am I missing something? Here is my code.

X =df[["Mileage","Age(yrs)"]]
y=df["Sell Price($)"]

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)

from sklearn.linear_model import LinearRegression
clf=LinearRegression()

clf.fit(X_train,y_train)
clf.predict(X_test)
>>>array([ 38014.9266005 ,  14240.40458389,  33695.58936258,  29870.44475795])

y_test

>>>3  40000
   8  12000
   1  34000
   4  31500


clf.score(X_test,y_test)
>>>0.97343231831177046
Andreas Rossberg
  • 34,518
  • 3
  • 61
  • 72
  • [Here](https://stackoverflow.com/questions/42191717/python-random-state-in-splitting-dataset) is the explanation – Shijith Jul 16 '20 at 16:08

1 Answers1

3

That random state you mention is an argument from sklearn module. It basically tells the module to split in a particular way. Usually random state = 42 is used. When used, the train and test data get split in the same way. This is highly useful when you want somebody else to test your model or maintain the same split everytime. I suggest you use the random state = 42.

Deepak
  • 126
  • 8
  • thanks for the answer sir.. but what's the difference between random_state=10,42,etc..? I've searched in the internet but it doesn't change anything technically – Mücahit Uğurlu Jul 16 '20 at 16:14
  • 2
    I don't people using random_state = 10. They use 0, 1 or 42. These numbers are just seed value to the random function. Using that number, the module splits your data. If you don't specify it, each time a different split occurs. You can read more about them here -- https://scikit-learn.org/dev/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split – Deepak Jul 16 '20 at 16:19