1

My sklearn accuracy_score function takes two following inputs:

accuracy_score(y_test, y_pred_class)

y_test is of pandas.core.series and y_pred_class is of numpy.ndarray. So do two different inputs produce wrong accuracy? It's actually giving no error and produce some score. If my procedure is not correct what should I do to produce accuracy correctly?

Edit

It's a binary classification problem and labels are not one-hot-encoded. So model.predict produces one probability value for each sample which are converted to label using np.round.

Outputs of model.predict looks like this--->

[[0.50104564]
 [0.50104564]
 [0.20969158]
 ...
 [0.5010457 ]
 [0.5010457 ]
 [0.5010457 ]]

My y_pred_class after rounding off looks like this--->

[[1.]
 [1.]
 [0.]
 ...
 [1.]
 [1.]
 [1.]]

And y_test which is pandas.series looks like this (as expected)--->

34793    1
60761    0
58442    0
56299    1
89501    0
        ..
91507    1
25467    1
79635    0
22230    1
22919    1

Are y_pred_class and y_test compatible to each other for accuracy_score() ?

Debbie
  • 911
  • 3
  • 20
  • 45
  • The [documentation](https://github.com/scikit-learn/scikit-learn/blob/9aaed4987/sklearn/metrics/_classification.py#L146) documentation says inputs shoud be `array-like` or `sparse matrix`. Based on [this](https://stackoverflow.com/questions/40378427/numpy-formal-definition-of-array-like-objects) an array-like is any Python object that np.array can convert to an ndarray so yes you could use it. – delirium78 Apr 12 '23 at 11:51

1 Answers1

3

Short answer: Yes, you can. Pandas is built upon numpy library. A small test:

import pandas as pd
from sklearn.metrics import accuracy_score
import numpy as np

y_true = np.array([1, 1, 0, 1])
y_pred = pd.Series([0, 0, 0, 0])
print(accuracy_score(y_true, y_pred))
print(accuracy_score(y_true, np.array(y_pred)))
TanjiroLL
  • 1,354
  • 1
  • 5
  • 5
  • Plz look into the 'edit' portion in my original question once. – Debbie Apr 12 '23 at 13:24
  • 1
    Yes, but always and always make sure inputs to have the same shape, so convert your predictions into 1d array using y_pred_class.squeeze() – TanjiroLL Apr 12 '23 at 13:32