Same accuracy and F1 score while doing multi label classification

Question

I have written a code based on this site and made different multi-label classifiers.

I would like to evaluate my model based on accuracy per class and F1 measurement per class.

The problem is that I am getting the same number for both accuracy and f1 measurement in all models.

I am suspicious I have done something wrong. I would like to know in which circumstances this may happen.

the code is exactly the same as the site and I calculated the f1 measurement like this:

print('Logistic Test accuracy is {} '.format(accuracy_score(test[category], prediction)))
    print 'Logistic f1 measurement is {} '.format(f1_score(test[category], prediction, average='micro'))

Update 1

this is the whole code,

df = pd.read_csv("finalupdatedothers.csv")
categories = ['ADR','WD','EF','INF','SSI','DI','others']

train,test = train_test_split(df,random_state=42,test_size=0.3,shuffle=True)
X_train = train.sentences
X_test = test.sentences

NB_pipeline = Pipeline([('tfidf', TfidfVectorizer(stop_words=stop_words)),
                        ('clf',OneVsRestClassifier(MultinomialNB(fit_prior=True,class_prior=None))),])
for category in categories:
    print 'processing {} '.format(category)
    NB_pipeline.fit(X_train,train[category])
    prediction = NB_pipeline.predict(X_test)
    print 'NB test accuracy is {} '.format(accuracy_score(test[category],prediction))
    print 'NB f1 measurement is {} '.format(f1_score(test[category],prediction,average='micro'))
    print "\n"

and this is the output:

processing ADR 
NB test accuracy is 0.821963394343 
NB f1 measurement is 0.821963394343

and this is the way my data looks:

,sentences,ADR,WD,EF,INF,SSI,DI,others
0,"extreme weight gain, short-term memory loss, hair loss.",1,0,0,0,0,0,0
1,I am detoxing from Lexapro now.,0,0,0,0,0,0,1
2,I slowly cut my dosage over several months and took vitamin supplements to help.,0,0,0,0,0,0,1
3,I am now 10 days completely off and OMG is it rough.,0,0,0,0,0,0,1
4,"I have flu-like symptoms, dizziness, major mood swings, lots of anxiety, tiredness.",0,1,0,0,0,0,0
5,I have no idea when this will end.,1,0,0,0,0,0,1

Why am I getting the same number?

Thanks.

Can you please share what codes you wrote, and what output you're getting? — user2906838, Aug 13 '18 at 04:45
@user2906838 updated with one mode and the output of that model. thanks for following :) — sariii, Aug 13 '18 at 04:50
are you getting same accuracy and f1_score for all categories or only this one? — Sociopath, Aug 13 '18 at 04:56
@AkshayNevrekar thanks for the comment. I am getting the same accuracy and f1 for all models. like for svm I am getting ... Processing ADR SVM Linear Test accuracy is 0.814753189129 SVM Linear f1 measurement is 0.814753189129 — sariii, Aug 13 '18 at 04:57
I also updated with a sample of my dataframe, in which it may help. Thanks :) — sariii, Aug 13 '18 at 05:00
how does the f1_score method looks alike? I couldn't see it in the article. — user2906838, Aug 13 '18 at 05:01
@user2906838 yea I added by myself, and it is f1 score by scikit learn library, Am I doing something wrong? — sariii, Aug 13 '18 at 05:02
Probably not, they must be essentially returning the same thing, can you please changing the value of `average`? to `weighted` perhaps. Also I would like you to read this post: https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9 — user2906838, Aug 13 '18 at 05:10
@user2906838 you are right, when I changed to weighted the result has changed but only less than 1 percent. I definitely look at the site and many thanks for your help. so you think that having the result which has less difference makes sense? — sariii, Aug 13 '18 at 05:16
@user2906838 do you want to add your comment as the answer so I can accept as the answer.? — sariii, Aug 13 '18 at 05:27
Well, yes. There is some discripencies between the accuracy_score and the f1_score. Well, that does in my knowledge they may look quite similar in some context as described in the article. — user2906838, Aug 13 '18 at 05:31

Vivek Kumar · Answer 1 · 2018-08-13T07:13:54.710

4

By doing this:

for category in categories:
...
...

You are essentially turning the problem from multi-label to binary. If you want to proceed with this, then no need for the OneVsRestClassifier. You can use the MultinomialNB directly. Or else you can directly do this with OneVsRestClassifier:

# Send all labels at once.
NB_pipeline.fit(X_train,train[categories])
prediction = NB_pipeline.predict(X_test)
print 'NB test accuracy is {} '.format(accuracy_score(test[categories],prediction))
print 'NB f1 measurement is {} '.format(f1_score(test[categories],prediction, average='micro'))

It may throw some warnings about some labels present in all the training data, but thats because the sample data you posted is too small.

@user2906838, you are correct about scores. When average='micro', the results produced will be equal. This is mentioned in documentation here:

Note that for “micro”-averaging in a multiclass setting with all labels included will produce equal precision, recall and F,

Its written about multi-class there, but I doubt that its the same for binary also. See this similar question in which the user has calculated all the scores manually: Multi-class Clasification (multiclassification): Micro-Average Accuracy, Precision, Recall and F Score All Equal

edited Aug 13 '18 at 07:13

answered Aug 13 '18 at 05:37

Vivek Kumar

35,217
8
109
132

thank you for the answer, I did not get the first part of your answer, can you please give me a link that explain it clearly? Actually I have got that idea from a site I mentined in my question so thats why I am kind of confused. Also when I applied your approach I got this error : raise ValueError("Unknown label type: %s" % repr(ys)) ValueError: Unknown label type: ( ADR WD EF INF SSI DI others – sariii Aug 13 '18 at 05:47
Also I should mention that I need the accuracy and F1 measurement for each class seperately thats why I have passed each category into the model. do you think still I do not that part? thanks for the following. Also if the part of your code is also working that will be great as I can have the overall of accuracy and f1 very easily – sariii Aug 13 '18 at 06:01
@sariaGoudarzi I cannot vouch for the correctness of the article you used. What I am saying is, your code, is using a single `category` at a time to `train` the data and in a single category, you only have 1 or 0. So inside the for loop, this is a simple binary problem (for that category). – Vivek Kumar Aug 13 '18 at 06:02
I see I was thinking as it is onevsrest so it trained the classifier with considering other classes however it calculates the accuracy only for that class. I would like to apply your code but It raises an error. do you have any idea how to fix it? – sariii Aug 13 '18 at 06:05
it raises this raise ValueError("Unknown label type: %s" % repr(ys)) ValueError: Unknown label type: ( ADR WD EF INF SSI DI others – sariii Aug 13 '18 at 20:20

score 1 · Accepted Answer · answered Aug 13 '18 at 05:40

1

Well, it's probably because both accuracy_score and f1_score are returning the same score. Although there discrepancies between how they are calculated, the result. If you want to know more about how they are calculated, there is already an answer for it here: How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn?

Regarding your current problem of the same score, Please change the value of average from micro to weighted. This should essentially change your scores. As I pointed in the comment.

answered Aug 13 '18 at 05:40

user2906838

1,178
9
20

In this case still NB give me the same result, Is there any specific thing regarding this? – sariii Aug 14 '18 at 04:23
did you mean `MultinomialNB` by NB? – user2906838 Aug 14 '18 at 04:25
I meant MultinomialNB :) – sariii Aug 14 '18 at 04:26
Ok, this is because they are essentially returning the same thing, try with different data, you should get the different outputs. – user2906838 Aug 14 '18 at 04:41
So I mean why the return the same thing for MultinomialNB. Is there any justification behind that? thanks:) – sariii Aug 14 '18 at 04:48
Sorry, I forgot what you were using yesterday, it was not `MultinomialNB` right? – user2906838 Aug 14 '18 at 04:54
well I am using a couple of the machine learning algorithm, as SVM Logistic Regression, ... Only for MultinomialNB the result of F1 and accuracy is the same. the rest of them now changed when I changed to "weighted". – sariii Aug 14 '18 at 04:59
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/177988/discussion-between-user2906838-and-saria-goudarzi). – user2906838 Aug 14 '18 at 05:02
The answer here does not highlight the reason behind. it. It should not be the accepted ans. – CKM May 16 '19 at 14:38

Same accuracy and F1 score while doing multi label classification

2 Answers2