9

I am working on a document which should contain the key differences between using Naive Bayes (generative) and Logistic Regression (discriminative) models for text classification.

During my research, I ran into this definition for Naive Bayes model: https://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

The probability of a document d being in class c is computed as ... where p(tk|c) is the conditional probability of term tk occurring in a document of class c...


When I got to the part of comparing Generative and Discriminative models, I found this explanation on StackOverflow as accepted: What is the difference between a Generative and Discriminative Algorithm?

A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditional probability distribution p(y|x) - which you should read as "the probability of y given x".


At this point I got confused: Naive Bayes is a generative model and uses conditional probabilities, but at the same time the discriminative models were described as if they learned the conditional probabilities as opposed to the joint probabilities of the generative models.


Can someone shed some light on this please?

Thank you!

Kristianmitk
  • 4,528
  • 5
  • 26
  • 46
GermanC
  • 279
  • 2
  • 8
  • 1
    The first paragraph of the accepted answer directly answers your question: https://stats.stackexchange.com/questions/4689/generative-vs-discriminative-models-in-bayesian-context – HFBrowning Dec 27 '17 at 18:23
  • Thanks for answering @HFBrowning ! From what I'm reading there, a generative model models the joint probability distribution `p(x,y)` to compute `p(y|x)`, but I'm still not seeing where `p(x,y)` is being used within the naive bayes link I shared; I only see conditional probabilites. – GermanC Dec 29 '17 at 14:13
  • I don't see it in that text either; which I took to mean that the authors of this text book didn't see the distinction as important or were being a bit sloppy. All other answers I found were pretty consistent and a lot of them pointed to what appeared to be a canonical paper on the topic (which I'm sure you've seen: http://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes.pdf) – HFBrowning Dec 29 '17 at 18:44
  • @HFBrowning I keep reaching the same conclusion: Naive Bayes has bayes rule behind, which uses conditional probabilities instead of joint probabilities, so I'm still confused. Have you seen `p(x,y)` in any formulas for Naive Bayes that you can link me to? – GermanC Dec 29 '17 at 19:42
  • 1
    On the paper I just linked you to, check out the first couple of paragraphs under `2. Preliminaries`. I think the explanation there is quite clear. There's also this page (https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/) where the algorithm is shown step-by-step in Python. Even if you don't know Python it's close enough to English that I think it might help quite a bit. – HFBrowning Dec 29 '17 at 20:18
  • 2
    @HFBrowning between your help and this [answer](https://stackoverflow.com/a/15137512/7186976) I understood. `p(x,y)` being `p(y|x) p(y)` clarifies a lot about how the joint was implicitely there. Thanks a lot! – GermanC Jan 02 '18 at 15:14

1 Answers1

6

It is generative in the sense that you don't directly model the posterior p(y|x) but rather you learn the model of the joint probability p(x,y) which can be also expressed as p(x|y) * p(y) (likelihood times prior) and then through the Bayes rule you seek to find the most probable y.

A good read I can recommend in this context is: "On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes" (Ng & Jordan 2004)

Kristianmitk
  • 4,528
  • 5
  • 26
  • 46