How to tune GaussianNB?

Question

Trying to fit data with GaussianNB() gives me low accuracy score.

I'd like to try Grid Search, but it seems that parameters sigma and theta cannot be set. Is there anyway to tune GausssianNB?

Naive Bayes makes very strong independence assumptions. It'd probably move on to a more powerful model instead of trying to tune NB. — cel, Oct 03 '16 at 09:43
http://scikit-learn.org/stable/auto_examples/model_selection/randomized_search.html#sphx-glr-auto-examples-model-selection-randomized-search-py should give you good idea how to use custom grid for CV based model tuning. — abhiieor, Oct 03 '16 at 09:44
`GridSearchCV` tunes parameters, but `GuassianNB` does not accept parameters, except `priors` parameter. — vlad, Oct 03 '16 at 09:58
Actually `GuassianNB` does not accept any parameter: `GaussianNB().get_params().keys()` results in empty dict. — vlad, Oct 03 '16 at 10:11
Finally it seems that only way to make this model better is to preprocess data. `sigma_` and `theta_` returned by the `fit()`, should help — vlad, Oct 03 '16 at 10:19

score 13 · Accepted Answer · answered Nov 03 '20 at 04:46

You can tune 'var_smoothing' parameter like this:

nb_classifier = GaussianNB()

params_NB = {'var_smoothing': np.logspace(0,-9, num=100)}
gs_NB = GridSearchCV(estimator=nb_classifier, 
                 param_grid=params_NB, 
                 cv=cv_method,   # use any cross validation technique 
                 verbose=1, 
                 scoring='accuracy') 
gs_NB.fit(x_train, y_train)

gs_NB.best_params_

Helen Batson · Answer 2 · 2020-12-29T00:20:37.547

3

As of version 0.20

GaussianNB().get_params().keys() returns 'priors' and 'var_smoothing'

A grid search would look like:

pipeline = Pipeline([
    ('clf', GaussianNB())
])

parameters = {
    'clf__priors': [None],
    'clf__var_smoothing': [0.00000001, 0.000000001, 0.00000001]
}

cv = GridSearchCV(pipeline, param_grid=parameters)

cv.fit(X_train, y_train)
y_pred_gnb = cv.predict(X_test)

edited Dec 29 '20 at 00:20

answered Oct 07 '20 at 19:13

Helen Batson

191
1
7

I'm not familiar with how GridSearchCV works but why did you name the properties of the parameters with the prefix 'clf__' is this required by GridSearchCV? – Christos Karapapas Dec 19 '20 at 22:09
1

The pipeline here uses the classifier (clf) = GaussianNB(), and the resulting parameter 'clf__var_smoothing' will be used to fit using the three values above ([0.00000001, 0.000000001, 0.00000001]). Using GridSearchCV results in the best of these three values being chosen as GridSearchCV considers all parameter combinations when tuning the estimators' hyper-parameters. See documentation: [link](https://scikit-learn.org/stable/modules/grid_search.html). – Helen Batson Dec 29 '20 at 00:43

Pavel Fedotov · Answer 3 · 2021-06-24T19:11:52.977

In an sklearn pipeline it may look as follows:

pipe = Pipeline(steps=[
                    ('pca', PCA()),
                    ('estimator', GaussianNB()),
                    ])
    
parameters = {'estimator__var_smoothing': [1e-11, 1e-10, 1e-9]}
Bayes = GridSearchCV(pipe, parameters, scoring='accuracy', cv=10).fit(X_train, y_train)
print(Bayes.best_estimator_)
print('best score:')
print(Bayes.best_score_)
predictions = Bayes.best_estimator_.predict(X_test)

score -1 · Answer 4 · answered Jun 21 '20 at 19:09

-1

Naive Bayes doesn't have any hyperparameters to tune.

answered Jun 21 '20 at 19:09

Matheus Schaly

163
4
21

How to tune GaussianNB?

4 Answers4