46

I'm trying to fit an SGDRegressor to my data and then check the accuracy. The fitting works fine, but then the predictions are not in the same datatype(?) as the original target data, and I get the error

ValueError: Can't handle mix of multiclass and continuous

When calling print "Accuracy:", ms.accuracy_score(y_test,predictions).

The data looks like this (just 200 thousand + rows):

Product_id/Date/product_group1/Price/Net price/Purchase price/Hour/Quantity/product_group2
0   107 12/31/2012  10  300 236 220 10  1   108

The code is as follows:

from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.linear_model import SGDRegressor
import numpy as np
from sklearn import metrics as ms

msk = np.random.rand(len(beers)) < 0.8

train = beers[msk]
test = beers[~msk]

X = train [['Price', 'Net price', 'Purchase price','Hour','Product_id','product_group2']]
y = train[['Quantity']]
y = y.as_matrix().ravel()

X_test = test [['Price', 'Net price', 'Purchase price','Hour','Product_id','product_group2']]
y_test = test[['Quantity']]
y_test = y_test.as_matrix().ravel()

clf = SGDRegressor(n_iter=2000)
clf.fit(X, y)
predictions = clf.predict(X_test)
print "Accuracy:", ms.accuracy_score(y_test,predictions)

What should I do differently? Thank you!

Sachin Kumar
  • 887
  • 1
  • 7
  • 17
lte__
  • 7,175
  • 25
  • 74
  • 131
  • 1
    You may consider converting the continuous values to discrete by rounding the continuous values to nearest integer using the round function. Please refer to this [link](https://stackoverflow.com/questions/38015181/accuracy-score-valueerror-cant-handle-mix-of-binary-and-continuous) for similar question answered by [*natbusa*](https://stackoverflow.com/users/511809/natbusa) – Dutse I Aug 25 '17 at 11:38
  • Dutse is right. Or you can use `y_preds = y_preds > 0.5` to change to discrete. Here you can set your own threshold. – Shark Deng Sep 03 '19 at 03:03
  • 1
    @SharkDeng you are wrong, as is the previous comment; the root cause of the issue is as already pointed out in the answers below (the linked answer was also wrong) – desertnaut Sep 21 '19 at 01:46

2 Answers2

80

Accuracy is a classification metric. You can't use it with a regression. See the documentation for info on the various metrics.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • So how exactly can I predict with my model? I mean, if `clf.predict(X_test)` gives me different output than the original, how am I supposed to even use it? This has got me puzzled. – lte__ May 22 '16 at 08:11
  • 4
    @lte__: In general you cannot expect to get exactly correct results from a regression model. What you hope for is that your predictions are overall close to the real values. To decide if they are close enough, you need to use a different evaluation metric (one of the regression metrics). See the documentation link I provided, which explains many metrics. – BrenBarn May 22 '16 at 18:34
32

Accuracy score is only for classification problems. For regression problems you can use: R2 Score, MSE (Mean Squared Error), RMSE (Root Mean Squared Error).