0

Right now I have this numpy array containing predictions of whether an email is 'spam' or 'ham'. So basically the result of a spam predicting model. I want to compare to a an array containing the classes of test set used. When I use the MultinominalNB.score() method on them, I get an error because it is meant to compare float values not strings.

So how can I change these two arrays into float values based on whether the list entry was 'spam' or 'ham'? And better still, is there another better method to quantitatively measure the quality of the model?

RobJan
  • 1,351
  • 1
  • 14
  • 18
Ahmed Samir
  • 305
  • 1
  • 3
  • 10
  • 1
    Can you give some example data? – Nils Werner Jul 22 '18 at 11:48
  • array(['ham', 'spam', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'spam', 'ham', 'ham', 'spam', 'ham', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham', 'ham'....], dtype=object) That is the test classes for instance, and the is the predictions array(['ham', 'spam', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'spam', 'ham', 'ham', 'spam'....], dtype=' – Ahmed Samir Jul 22 '18 at 12:03
  • What about `a == 'ham'`? – Nils Werner Jul 22 '18 at 12:24
  • I tried to iteratively change them using a for a loop as follows: for c, x in test_classes: if x == 'ham': test_classes[c] = 0, but I get a ValueError: too many values to unpack (expected 2) – Ahmed Samir Jul 22 '18 at 12:26
  • @AhmedSamir Please edit your question when you have additional information. People tend to ignore comments and you can't format code properly in a comment. – Mr. T Jul 22 '18 at 12:45
  • Possible duplicate of [Fast replacement of values in a numpy array](https://stackoverflow.com/questions/3403973/fast-replacement-of-values-in-a-numpy-array) – Mr. T Jul 22 '18 at 12:48

1 Answers1

0

Assuming that you have obtained the y_true and y_predicted then use this:

import numpy as np


y_test = np.array(['ham', 'spam', 'ham', 'spam', 'ham', 'ham', 'ham', 'ham', 
                   'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 
                   'ham', 'ham', 'ham', 'spam','ham', 'ham', 'spam'], dtype=object)

y_predicted = np.array(['ham', 'spam', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham',
                        'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 'ham', 
                        'ham', 'ham', 'ham', 'spam','ham', 'ham', 'spam'], dtype=object)

y_test[y_test == 'ham']=0
y_test[y_test == 'spam']=1

y_predicted[y_predicted == 'ham']=0
y_predicted[y_predicted == 'spam']=1

Results:

print(y_test=
#array([0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1], dtype=object)

print(y_predicted)
#array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1], dtype=object)
seralouk
  • 30,938
  • 9
  • 118
  • 133