How to correctly compute the optimal C and gamma for my SVM?

Question

I am trying to compute the optimal C and Gamma for my SVM. When trying to run my script I get this error:

ValueError: Invalid parameter max_features for estimator SVC. Check the list of available parameters withestimator.get_params().keys().

I went through the docs to understand what n_estimators actually means so that I know what values I should fill in there. But it is not totally clear to me. Could someone tell me what this value should be so that I can run my script in order to find the optimal C and gamma?

my code:

if __name__=='__main__':

    fname = "/home/John/labels.csv"
    labels = pd.read_csv(fname, header=None).as_matrix()[:, 1]
    labels = map(itemgetter(1),
                 map(os.path.split,
                     map(os.path.dirname, labels))) 

    fname = "/home/John/reps.csv" 
    embeddings = pd.read_csv(fname, header=None).as_matrix()
    le = LabelEncoder().fit(labels)
    labelsNum = le.transform(labels)
    nClasses = len(le.classes_)


    svcClassifier = SVC(kernel='rbf', probability=True, C=10, gamma=10)
    #classifier = OneVsRestClassifier(svcClassifier).fit(embeddings, labelsNum)
    param_grid = { 
        'n_estimators': [200, 700],
        'max_features': ['auto', 'sqrt', 'log2']
    }

    CV_rfc = GridSearchCV(estimator=svcClassifier, param_grid=param_grid, cv= 5)
    CV_rfc.fit(embeddings, labelsNum)
    print CV_rfc.best_params_

After trying I manually found out that in my case C=10 and gamma=10 give the best results. I would however like to use this function to find out what the optimal values should be.

My code is insired by this post: How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

If an answer helps you, please vote it up (https://meta.stackexchange.com/questions/173399/how-to-upvote-on-stack-overflow), so that other users know which answer helped you, too. — zimmerrol, Sep 26 '17 at 15:06

zimmerrol · Accepted Answer · 2017-09-26T13:24:57.887

0

The SVC class has no argument max_features or n_estimators as these are arguments of the RandomForest you used as a base for your code. If you want to optimize the model regarding C and gamma you can try to use:

param_grid = { 
    'C': [0.1, 0.5, 1.0],
    'gamma': [0.1, 0.5, 1.0]
}

Furhtermore, I also recommend you to search for the optimal kernel, which can be rbf, linear or poly in the sklearn framework.

Edit: The values here are just arbitray and meant to illustrate the general approach. You should add many different values here, which depend on your situation. And whose range also depends on your situation.

edited Sep 26 '17 at 13:24

answered Sep 26 '17 at 13:17

zimmerrol

4,872
3
22
41

How did you come up with those values for C and Gamma? In my case it seemed like 10 for gamma and C gave the best results so far. – Sep 26 '17 at 13:20
@traducerad These are just arbitrary values. You should add many different values here which depend highly on your situation. My code is just an example how to search for `C` and `gamma` of a `SVC` with a `GridSearch`. – zimmerrol Sep 26 '17 at 13:23
Should this list just contain all the values between which it can pick? Like eg so: [0.1, 0.2, ..., 9.8, 9.9, 10]? – Sep 26 '17 at 13:25
In my case it is 10 that gave the best values. So it is pretty strange that it returns 0.1 and results in an incorrect classification. – Sep 26 '17 at 13:27
@traducerad Yes. It is always a good idea to start with a list a broader values like `[0.5, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]` to get an idea about the the shape of the error landscape and then, choosing finer values (like in `0.1` steps). If this has answered your original question, please mark my response as the answer and vote it up, so that other users can profit from it. – zimmerrol Sep 26 '17 at 13:31

How to correctly compute the optimal C and gamma for my SVM?

1 Answers1