I am trying to compute the optimal C and Gamma for my SVM. When trying to run my script I get this error:
ValueError: Invalid parameter max_features for estimator SVC. Check the list of available parameters with
estimator.get_params().keys().
I went through the docs to understand what n_estimators
actually means so that I know what values I should fill in there. But it is not totally clear to me. Could someone tell me what this value should be so that I can run my script in order to find the optimal C and gamma?
my code:
if __name__=='__main__':
fname = "/home/John/labels.csv"
labels = pd.read_csv(fname, header=None).as_matrix()[:, 1]
labels = map(itemgetter(1),
map(os.path.split,
map(os.path.dirname, labels)))
fname = "/home/John/reps.csv"
embeddings = pd.read_csv(fname, header=None).as_matrix()
le = LabelEncoder().fit(labels)
labelsNum = le.transform(labels)
nClasses = len(le.classes_)
svcClassifier = SVC(kernel='rbf', probability=True, C=10, gamma=10)
#classifier = OneVsRestClassifier(svcClassifier).fit(embeddings, labelsNum)
param_grid = {
'n_estimators': [200, 700],
'max_features': ['auto', 'sqrt', 'log2']
}
CV_rfc = GridSearchCV(estimator=svcClassifier, param_grid=param_grid, cv= 5)
CV_rfc.fit(embeddings, labelsNum)
print CV_rfc.best_params_
After trying I manually found out that in my case C=10
and gamma=10
give the best results. I would however like to use this function to find out what the optimal values should be.
My code is insired by this post: How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)