I have to implement a naive bayes classifier for classifying a document to a class. So, in getting the conditional probability for a term belonging to class, along with laplace smoothing, we have:
prob(t | c) = Num(Word occurences in the docs of the class c) + 1 / Num(documents in class c) + |V|
Its a bernoulli model, which will have either 1 or 0 and the vocabulary is really large, like perhaps 20000 words and so on. So, won't the laplace smoothing give really small values due to the large size of the vocabulary or am I doing something wrong.
According to the psuedo code from this link: http://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html, for the bernoulli model we just add 2 instead of |V|. Why so?