I am normalizing my text input before running MultinomialNB in sklearn like this:
vectorizer = TfidfVectorizer(max_df=0.5, stop_words='english', use_idf=True)
lsa = TruncatedSVD(n_components=100)
mnb = MultinomialNB(alpha=0.01)
train_text = vectorizer.fit_transform(raw_text_train)
train_text = lsa.fit_transform(train_text)
train_text = Normalizer(copy=False).fit_transform(train_text)
mnb.fit(train_text, train_labels)
Unfortunately, MultinomialNB does not accept the non-negative values created during the LSA stage. Any ideas for getting around this?