Dealing with negative values in sklearn MultinomialNB

Question

I am normalizing my text input before running MultinomialNB in sklearn like this:

vectorizer = TfidfVectorizer(max_df=0.5, stop_words='english', use_idf=True)
lsa = TruncatedSVD(n_components=100)
mnb = MultinomialNB(alpha=0.01)

train_text = vectorizer.fit_transform(raw_text_train)
train_text = lsa.fit_transform(train_text)
train_text = Normalizer(copy=False).fit_transform(train_text)

mnb.fit(train_text, train_labels)

Unfortunately, MultinomialNB does not accept the non-negative values created during the LSA stage. Any ideas for getting around this?

Try using `sklearn.preprocessing.MinMaxScaler()`. Scale your training features to `[0,1]` — o-90, Jun 11 '14 at 18:51
Or try [non-negative matrix factorization](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html) (NMF) instead of LSA, or an SVM instead of naive Bayes. — Fred Foo, Jun 12 '14 at 09:21

score 8 · Answer 1 · answered Jan 20 '16 at 18:32

I recommend you that don't use Naive Bayes with SVD or other matrix factorization because Naive Bayes based on applying Bayes' theorem with strong (naive) independence assumptions between the features. Use other classifier, for example RandomForest

I tried this experiment with this results:

vectorizer = TfidfVectorizer(max_df=0.5, stop_words='english', use_idf=True)
lsa = NMF(n_components=100)
mnb = MultinomialNB(alpha=0.01)

train_text = vectorizer.fit_transform(raw_text_train)
train_text = lsa.fit_transform(train_text)
train_text = Normalizer(copy=False).fit_transform(train_text)

mnb.fit(train_text, train_labels)

This is the same case but I'm using NMP(non-negative matrix factorization) instead SVD and got 0,04% accuracy.

Changing the classifier MultinomialNB for RandomForest i got 79% accuracy.

Therefore change the classifier or don't apply a matrix factorization.

Don't forget to import the [NFM] [1] with `from sklearn.decomposition import NMF` [1]: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html — Zap, Apr 25 '18 at 21:19

score 0 · Answer 2 · edited Mar 11 '21 at 08:52

0

Try to do this in fit()

train_text.np.todense()

edited Mar 11 '21 at 08:52

karel

5,489
46
45
50

answered Mar 10 '21 at 16:36

Roaa

23
1
7

score 0 · Answer 3 · edited Feb 04 '22 at 13:28

0

I had the same isse running on NB, and indeed using sklearn.preprocessing.MinMaxScaler() suggested by gobrewers14 works. But it actually reduced the performance accuracy on my Decision Tree, Random Forest and KNN by 0.2% from the same standardized dataset.

edited Feb 04 '22 at 13:28

desertnaut

57,590
26
140
166

answered Oct 11 '21 at 08:58

yeong wee ping

1
1

1

This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/30047380) – Emi OB Oct 11 '21 at 11:41

score 0 · Answer 4 · answered Jan 01 '22 at 14:41

0

Try creating a pipeline with Normalization as the first step and model fitting as the second step.

from sklearn.preprocessing import MinMaxScaler
p = Pipeline([('Normalizing',MinMaxScaler()),('MultinomialNB',MultinomialNB())])
p.fit(X_train,y_train)

answered Jan 01 '22 at 14:41

Rakshit Sinha

11
1

Dealing with negative values in sklearn MultinomialNB

4 Answers4

Linked