0

I want to use LogisticRegression to classify. So, I use RandomizedSearchCV to pick best C params in LogisticRegression.

My question is: Why do best_params_ change every time I run this program? I assume that best_params_ should always stay the same.

Code as follows:

data = load_iris().data
target = load_iris().target

# DATA Split

TrainData , TestData ,TrainTarget , TestTarget = train_test_split(data,target,test_size=0.25,random_state=0)
assert len(TrainData)==len(TrainTarget)
Skf = StratifiedKFold(n_splits=5)

#Model

LR = LogisticRegression(C=10,multi_class='multinomial',penalty='l2',solver='sag',max_iter=10000,random_state=0)

#Params selection with Cross Validation
params = {'C':np.random.randint(1,10,10)}
RS = RandomizedSearchCV(LR,params,return_train_score=True,error_score=0,random_state=0)

RS.fit(TrainData,TrainTarget)

Result = pd.DataFrame(RS.cv_results_)
print RS.best_params_
Dr Rob Lang
  • 6,659
  • 5
  • 40
  • 60

1 Answers1

2

You are correctly setting the random_state to LogisticRegression and RandomizedSearchCV. But there's one more source which can change the train test data and thats when you generate the params using np.random. This is changed on each run.

For controlling this behaviour, you can set the numpy.random.seed() to an integer of your choice. Something like this on top of your code:

np.random.seed(0)

Note: Doing this will also set seed for all scikit modules, because internally scikit uses this. So you may not need to set random_state everywhere in that case, but its not recommended.

See this answer - Should I use `random.seed` or `numpy.random.seed` to control random number generation in `scikit-learn`?.

You may want to check these resources also:

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132