-2

I made a train-test split on my data and fit a support vector machine to it.

xtrain, xtest, ytrain, ytest = train_test_split(df, target)

svc = svm.SVC(C=30, gamma='auto')
svc.fit(xtrain,ytrain)
svc.score(xtest,ytest)

I am fitting the SVC model to the iris dataset and I get different results for every run of the train_test_split (which is obvious).

is there any attribute or function of train test_test_split or any other way so that after getting the result (after executing the above code) I can find out what is the value of random_state for which I am getting the result?

1 Answers1

0

You can run a homemade grid search to find the best value for random_state.

However, you should never optimize with respect to randomness. By doing so, you are finding the model that best fits some random occurrence that by definition has no causal relation with your target variable.

If you really want to continue, then you must record the score that results from each random state's split.

import numpy as np
import pandas as pd

# Array of random_state values from -100 to 100
random_states = np.arange(start=-100, stop=101)

# Initialize a list where we'll store the score of each random_state
scores = []

# Initialize search
for state in random_states:
    xtrain, ytrain, xtest, ytest = train_test_split(df, target, random_state=state)
    svc = svm.SVC(C=30, gamma='auto')
    svc.fit(xtrain, ytrain)
    scores.append(svc.score(xtest, ytest))

And now put the two arrays in a pandas data frame.

results = pd.DataFrame({'random_state':random_states, 'score':scores})
results[results['score'] == results['score'].max()]
Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76