3

I am trying to perform a GridSearchCV function on OneClassSVM, but I can't seem to find right scoring method for OCSVM. From what i've gathered something like OneClassSVM.score does not exists thus is doesn't have a default scoring function needed in GridSearchCV. Unfortunately no scoring methods from the documentation doesn't work either because they are dedicated to supervised ML and OCSVM is a unsupervised method.

Is there any way to perform GridSearch (or something similar to it, letting me tune the model with right parameters) on OneClassSVM??

Here is my code for GridSearchCV

nus = [0.001, 0.01, 0.1, 1]
gammas = [0.001, 0.01, 0.1, 1]
tuned_parameters = {'kernel' : ['rbf'], 'gamma' : gammas, 'nu': nus}
grid_search = GridSearchCV(svm.OneClassSVM(), tuned_parameters, 
scoring="??????????????????????", n_jobs=4)
grid_search.fit(X_train)

Yes I know .fit only takes one parameter but since it is unsupervised method i don't have any Y to put there. Thank you for the help.

duscaes
  • 31
  • 1
  • 2
  • Do you have any list (ground truth) of whats inlier and whats outlier? – Vivek Kumar Apr 23 '18 at 11:03
  • You mean the collumn in the table that would indicate if something is anomaly or not? yes but i dont want to use it as y bc in the end i want to give it just a training file and want it to decide parameters. not sure if its possible – duscaes Apr 23 '18 at 11:15
  • Please see this: https://stats.stackexchange.com/q/192530/133411 – Vivek Kumar Apr 23 '18 at 11:26
  • Does someone have a link to a full example using GridSearchCV with OneClassSVM? – user3731622 Oct 01 '19 at 01:31

1 Answers1

2

I know it's a late reply but hopefully it will be useful to somebody. To tune parameters you need to have right labels (outlier/inliner). Then when you have correct parameters you can use OneClassSVM in an unsupervised way.

So scoring function for this approach can be for example:

  • f1
  • precision
  • recall

Code for checking precision and recall scores:

scores = ['precision', 'recall']
for score in scores:
    clf = GridSearchCV(svm.OneClassSVM(), tuned_parameters, cv=10,
                           scoring='%s_macro' % score, return_train_score=True)

    clf.fit(X_train, y_train)

    resultDf = pd.DataFrame(clf.cv_results_)
    print(resultDf[["mean_test_score", "std_test_score", "params"]].sort_values(by=["mean_test_score"], ascending=False).head())

    print("Best parameters set found on development set:")
    print()
    print(clf.best_params_)

Here is the link with example usage of ElipticEnvelope (another anomaly detection algorithm) with GridSearchCV: https://sdsawtelle.github.io/blog/output/week9-anomaly-andrew-ng-machine-learning-with-python.html

Here you can find example of using precision and recall scoring with classification algorith: https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html

  • Can you explain what your y_train consists of? It seems like you're suggesting it consists of +1 and -1 labels. These should not be used while training a one-class SVM. That being said, I don't know if the call to fit will be smart about this. The call might realize this & only use the +1 labels for training & +1 & -1 labels for evaluating/testing. Any clarification would really be appreciated. – user3731622 Sep 30 '19 at 23:34
  • Yes, it contains 1 & -1. I may also be wrong, but how I understand it is that if scoring is precision, recall or f1 (which reflect how good outlier detection is and not how many datapoints were predicted correctly) scoring gridsearchcv will evaluate parameters based on this score. – Agnieszka Miszkurka Oct 02 '19 at 09:04
  • I understand how the GridSearchCV could use the +/- 1 truth/labels to evaluate the trained model. For the evaluation, this makes sense. However, the model being evaluated will be different depending on whether it utilizes the +/-1 truth/labels (i.e. supervised vs unsupervised training). Thus, my curiosity is what how does GridSearchCV handle the fact that you pass it truth/labels, but want to train an unsupervised system. I imagine it doesn't use the labels during training, but following the source code I couldn't verify w/ an example. – user3731622 Oct 02 '19 at 18:00