4

I am new to this.

I have a set of weak classifiers constructed using Naive Bayes Classifier (NBC) in Sklearn toolkit.

My problem is how do I combine the output of each of the NBC to make final decision. I want my decision to be in probabilities and not labels.

I made a the following program in python. I assume 2 class problem from iris-dataset in sklean. For demo/learning say I make a 4 NBC as follows.

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

import numpy as np
import cPickle
import math

iris = datasets.load_iris()

gnb1 = GaussianNB()
gnb2 = GaussianNB()
gnb3 = GaussianNB()
gnb4 = GaussianNB()

#Actual dataset is of 3 class I just made it into 2 class for this demo
target = np.where(iris.target, 2, 1)

gnb1.fit(iris.data[:, 0].reshape(150,1), target)
gnb2.fit(iris.data[:, 1].reshape(150,1), target)
gnb3.fit(iris.data[:, 2].reshape(150,1), target)
gnb4.fit(iris.data[:, 3].reshape(150,1), target)

#y_pred = gnb.predict(iris.data)
index = 0
y_prob1 = gnb1.predict_proba(iris.data[index,0].reshape(1,1))
y_prob2 = gnb2.predict_proba(iris.data[index,1].reshape(1,1))
y_prob3 = gnb3.predict_proba(iris.data[index,2].reshape(1,1))
y_prob4 = gnb4.predict_proba(iris.data[index,3].reshape(1,1))

#print y_prob1, "\n", y_prob2, "\n", y_prob3, "\n", y_prob4 

 # I just added it over all for each class
pos = y_prob1[:,1] + y_prob2[:,1] + y_prob3[:,1] + y_prob4[:,1]
neg = y_prob1[:,0] + y_prob2[:,0] + y_prob3[:,0] + y_prob4[:,0]

print pos
print neg

As you will notice I just simply added the probabilites of each of NBC as final score. I wonder if this correct?

If I have dont it wrong can you please suggest some ideas so I can correct myself.

kcc__
  • 1,638
  • 4
  • 30
  • 59

2 Answers2

4

First of all - why you do this? You should have one Naive Bayes here, not one per feature. It looks like you do not understand the idea of the classifier. What you did is actually what Naive Bayes is doing internally - it treats each feature independently, but as these are probabilities you should multiply them, or add logarithms, so:

  1. You should just have one NB, gnb.fit(iris.data, target)
  2. If you insist on having many NBs, you should merge them through multiplication or addition of logarithms (which is the same from mathematical perspective, but multiplication is less stable in the numerical sense)

    pos = y_prob1[:,1] * y_prob2[:,1] * y_prob3[:,1] * y_prob4[:,1]

    or

    pos = np.exp(np.log(y_prob1[:,1]) + np.log(y_prob2[:,1]) + np.log(y_prob3[:,1]) + np.log(y_prob4[:,1]))

    you can also directly predit logarithm through gnb.predict_log_proba instead of gbn.predict_proba.

    However, this approach have one error - Naive Bayes will also include prior in each of your prob's, so you will have very skewed distributions. So you have to manually normalize

    pos_prior = gnb1.class_prior_[1] # all models have the same prior so we can use the one from gnb1

    pos = pos_prior_ * (y_prob1[:,1]/pos_prior_) * (y_prob2[:,1]/pos_prior_) * (y_prob3[:,1]/pos_prior_) * (y_prob4[:,1]/pos_prior_)

    which simplifies to

    pos = y_prob1[:,1] * y_prob2[:,1] * y_prob3[:,1] * y_prob4[:,1] / pos_prior_**3

    and for log to

    pos = ... - 3 * np.log(pos_prior_)

    So once again - you should use the "1" option.

lejlot
  • 64,777
  • 8
  • 131
  • 164
  • thanks for your reply. Actually, I did create multiple GNB in my program as I was confused so I decided to check for understanding the concept. Thanks for directing on right path. Besides that I am confused like as you said we can add logs or multiple the response. How do I decide which class the query vector belongs to? As I will add or multiple the response I will get scalar value so how to get the class info. – kcc__ Nov 02 '15 at 13:53
  • 1
    You classify to the class with bigger probability, that's all – lejlot Nov 02 '15 at 13:54
  • I see. Just to check if I got the idea. As you stated two points above in your solution. if i choose to use (1) single NB than I dont have to do the add or mul and instead I can use predict_log_proba() in sklearn? I assume this function does what you stated in (2) internally. Is this correct? I am sorry for my lack of understanding. – kcc__ Nov 02 '15 at 13:58
  • Everything seems clearer now to me. Just one last question. if I use option (1) with predict_log_proba(.), do I still need to normalize due to skewed distribution? Or is this only for option (2) – kcc__ Nov 02 '15 at 14:06
  • 1
    Option (1) makes everything by itself, the problem only arises when you build multiple NB (each of which use an internal prior to make predictions). – lejlot Nov 02 '15 at 14:08
  • Using multiple NB classifiers probably doesn't make sense in this context, but if one wants to preserve information about the relative importances of different features, the only way to do that is by keeping the features separate, and, like you say, the priors would need to be fixed. With separate base NB classifiers, one could then construct a meta-classifier that optimizes the relative weights of each of the base classifiers rather than simply averaging. – Petergavinkin Mar 06 '18 at 18:54
  • This is very helpful for situations such as https://stackoverflow.com/q/14254203/290182 – beldaz Jun 25 '18 at 11:33
  • This is a fantastic answer which actually provides a real detail answer to a number of other questions across stack sites including https://stackoverflow.com/a/34036255/2254228 https://stackoverflow.com/a/34036255/2254228 – Chuck Oct 28 '18 at 09:42
1

The answer by lejlot is almost correct. The one thing missing is that you need to normalize his pos result (the product of the probabilities, divided by the prior) by the sum of this pos result for both classes. Otherwise, the sum of the probabilities of all classes will not be equal to one.

Here is a sample code that test the result of this procedure for a dataset with 6 features:

# Use one Naive Bayes for all 6 features:

gaus = GaussianNB(var_smoothing=0)
gaus.fit(X, y)
y_prob1 = gaus.predict_proba(X)

# Use one Naive Bayes on each half of the features and multiply the results:

gaus1 = GaussianNB(var_smoothing=0)
gaus1.fit(X[:, :3], y)
y_log_prob1 = gaus1.predict_log_proba(X[:, :3])

gaus2 = GaussianNB(var_smoothing=0)
gaus2.fit(X[:, 3:], y)
y_log_prob2 = gaus2.predict_log_proba(X[:, 3:])

pos = np.exp(y_log_prob1 + y_log_prob2 - np.log(gaus1.class_prior_))
y_prob2 = pos / pos.sum(axis=1)[:,None]

y_prob1 should be equal to y_prob2 apart from numerical errors (var_smoothing=0 helps reducing the error).

hsxavier
  • 89
  • 3
  • why did you subtract np.log(gaus1.class_prior_)? – Sepehr Omidvar Jan 11 '22 at 08:47
  • 1
    If I remember correctly (I did this over an year ago), the probability output of GaussianNB includes the class prior, as it should. So when you multiply the probability outputs (i.e. sum their logs) from two GaussianNB you get an extra class prior factor that shouldn't be there (there should only be one). Thus, I subtract its log to compensate this extra factor. – hsxavier Jan 13 '22 at 15:29