I'm using scikit learn with the code shown below.
I have class imbalance (roughly a 90:10 split of class 0:1). After reading a number of other questions I've used the class_weighted parameter.
However every time I run the code I get a different set of important features and different AOC, precision, recall etc.
The problem is not there when I remove the class_weight parameter.
As shown, I've set the random_state to be constant so this is not the issue. A good number of the predictors are highly correlated. Does anyone know what the issue is? (Note I posted a similar question yesterday but this was downvoted as I hadn't been clear enough so rather than have a long chain of comments I've deleted the question which is hopefully clearer to others and now provides the information needed).
x_train, x_test, y_train, y_test = train_test_split(x, y)
parameters = {
'max_depth': [6,7, 8],
'min_samples_split': [100, 150],
'min_samples_leaf': [50,75]
}
clf = GridSearchCV(DecisionTreeClassifier(
random_state=99,
class_weight='balanced'),
parameters, refit=True, cv=10)
clf.fit(x_train, y_train.ravel()
# create main tree using best settings
clf2 = DecisionTreeClassifier(
max_depth=clf.best_params_['max_depth'],
min_samples_split=clf.best_params_['min_samples_split'],
min_samples_leaf=clf.best_params_['min_samples_leaf'],
random_state=99,
class_weight='balanced')
clf2.fit(x_train, y_train.ravel())