1

Is it possible (and how if it is) to dynamically train sklearn MultinomialNB Classifier? I would like to train(update) my spam classifier every time I feed an email in it.

I want this (does not work):

x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
for i in range(len(x_train)):
    clf.fit([x_train[i]], [y_train[i]])
preds = clf.predict(x_test)

to have similar result as this (works OK):

x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
clf.fit(x_train, y_train)
preds = clf.predict(x_test)
desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

2

Scikit-learn supports incremental learning for multiple algorithms, including MultinomialNB. Check the docs here

You'll need to use the method partial_fit() instead of fit(), so your example code would look like:

x_train, x_test, y_train, y_test = tts(features, labels, test_size=0.2)
clf = MultinomialNB()
for i in range(len(x_train)):
    if i == 0:
        clf.partial_fit([x_train[i]], [y_train[I]], classes=numpy.unique(y_train))
    else:
        clf.partial_fit([x_train[i]], [y_train[I]])
preds = clf.predict(x_test)

Edit: added the classes argument to partial_fit, as suggested by @BobWazowski

foglerit
  • 7,792
  • 8
  • 44
  • 64
  • Tried and it worked! Thanks! The only thing partial_fit requires classes to be passed at the first call. As far as I understand, I should pass `classes=numpy.unique(y_train)` PS: could you update your answer for the future reference? – Bob Wazowski May 27 '20 at 07:40