18

Trying to fit data with GaussianNB() gives me low accuracy score.

I'd like to try Grid Search, but it seems that parameters sigma and theta cannot be set. Is there anyway to tune GausssianNB?

Mattravel
  • 1,358
  • 1
  • 15
vlad
  • 771
  • 2
  • 10
  • 21
  • Naive Bayes makes very strong independence assumptions. It'd probably move on to a more powerful model instead of trying to tune NB. – cel Oct 03 '16 at 09:43
  • http://scikit-learn.org/stable/auto_examples/model_selection/randomized_search.html#sphx-glr-auto-examples-model-selection-randomized-search-py should give you good idea how to use custom grid for CV based model tuning. – abhiieor Oct 03 '16 at 09:44
  • 2
    `GridSearchCV` tunes parameters, but `GuassianNB` does not accept parameters, except `priors` parameter. – vlad Oct 03 '16 at 09:58
  • 1
    Actually `GuassianNB` does not accept any parameter: `GaussianNB().get_params().keys()` results in empty dict. – vlad Oct 03 '16 at 10:11
  • 1
    Finally it seems that only way to make this model better is to preprocess data. `sigma_` and `theta_` returned by the `fit()`, should help – vlad Oct 03 '16 at 10:19

4 Answers4

13

You can tune 'var_smoothing' parameter like this:

nb_classifier = GaussianNB()

params_NB = {'var_smoothing': np.logspace(0,-9, num=100)}
gs_NB = GridSearchCV(estimator=nb_classifier, 
                 param_grid=params_NB, 
                 cv=cv_method,   # use any cross validation technique 
                 verbose=1, 
                 scoring='accuracy') 
gs_NB.fit(x_train, y_train)

gs_NB.best_params_
ana
  • 198
  • 2
  • 13
3

As of version 0.20

GaussianNB().get_params().keys() returns 'priors' and 'var_smoothing'

A grid search would look like:

pipeline = Pipeline([
    ('clf', GaussianNB())
])

parameters = {
    'clf__priors': [None],
    'clf__var_smoothing': [0.00000001, 0.000000001, 0.00000001]
}

cv = GridSearchCV(pipeline, param_grid=parameters)

cv.fit(X_train, y_train)
y_pred_gnb = cv.predict(X_test)
Helen Batson
  • 191
  • 1
  • 7
  • I'm not familiar with how GridSearchCV works but why did you name the properties of the parameters with the prefix 'clf__' is this required by GridSearchCV? – Christos Karapapas Dec 19 '20 at 22:09
  • 1
    The pipeline here uses the classifier (clf) = GaussianNB(), and the resulting parameter 'clf__var_smoothing' will be used to fit using the three values above ([0.00000001, 0.000000001, 0.00000001]). Using GridSearchCV results in the best of these three values being chosen as GridSearchCV considers all parameter combinations when tuning the estimators' hyper-parameters. See documentation: [link](https://scikit-learn.org/stable/modules/grid_search.html). – Helen Batson Dec 29 '20 at 00:43
1

In an sklearn pipeline it may look as follows:

pipe = Pipeline(steps=[
                    ('pca', PCA()),
                    ('estimator', GaussianNB()),
                    ])
    
parameters = {'estimator__var_smoothing': [1e-11, 1e-10, 1e-9]}
Bayes = GridSearchCV(pipe, parameters, scoring='accuracy', cv=10).fit(X_train, y_train)
print(Bayes.best_estimator_)
print('best score:')
print(Bayes.best_score_)
predictions = Bayes.best_estimator_.predict(X_test)
Pavel Fedotov
  • 748
  • 1
  • 7
  • 29
-1

Naive Bayes doesn't have any hyperparameters to tune.

Matheus Schaly
  • 163
  • 4
  • 21