Python/Scikit-Learn - Can't handle mix of multiclass and continuous

Question

I'm trying to fit an SGDRegressor to my data and then check the accuracy. The fitting works fine, but then the predictions are not in the same datatype(?) as the original target data, and I get the error

ValueError: Can't handle mix of multiclass and continuous

When calling print "Accuracy:", ms.accuracy_score(y_test,predictions).

The data looks like this (just 200 thousand + rows):

Product_id/Date/product_group1/Price/Net price/Purchase price/Hour/Quantity/product_group2
0   107 12/31/2012  10  300 236 220 10  1   108

The code is as follows:

from sklearn.preprocessing import StandardScaler
import numpy as np
from sklearn.linear_model import SGDRegressor
import numpy as np
from sklearn import metrics as ms

msk = np.random.rand(len(beers)) < 0.8

train = beers[msk]
test = beers[~msk]

X = train [['Price', 'Net price', 'Purchase price','Hour','Product_id','product_group2']]
y = train[['Quantity']]
y = y.as_matrix().ravel()

X_test = test [['Price', 'Net price', 'Purchase price','Hour','Product_id','product_group2']]
y_test = test[['Quantity']]
y_test = y_test.as_matrix().ravel()

clf = SGDRegressor(n_iter=2000)
clf.fit(X, y)
predictions = clf.predict(X_test)
print "Accuracy:", ms.accuracy_score(y_test,predictions)

What should I do differently? Thank you!

You may consider converting the continuous values to discrete by rounding the continuous values to nearest integer using the round function. Please refer to this [link](https://stackoverflow.com/questions/38015181/accuracy-score-valueerror-cant-handle-mix-of-binary-and-continuous) for similar question answered by [*natbusa*](https://stackoverflow.com/users/511809/natbusa) — Dutse I, Aug 25 '17 at 11:38
Dutse is right. Or you can use `y_preds = y_preds > 0.5` to change to discrete. Here you can set your own threshold. — Shark Deng, Sep 03 '19 at 03:03
@SharkDeng you are wrong, as is the previous comment; the root cause of the issue is as already pointed out in the answers below (the linked answer was also wrong) — desertnaut, Sep 21 '19 at 01:46

score 80 · Accepted Answer · answered May 21 '16 at 20:17

80

Accuracy is a classification metric. You can't use it with a regression. See the documentation for info on the various metrics.

answered May 21 '16 at 20:17

BrenBarn

242,874
37
412
384

So how exactly can I predict with my model? I mean, if `clf.predict(X_test)` gives me different output than the original, how am I supposed to even use it? This has got me puzzled. – lte__ May 22 '16 at 08:11
4

@lte__: In general you cannot expect to get exactly correct results from a regression model. What you hope for is that your predictions are overall close to the real values. To decide if they are close enough, you need to use a different evaluation metric (one of the regression metrics). See the documentation link I provided, which explains many metrics. – BrenBarn May 22 '16 at 18:34

score 32 · Answer 2 · answered Jan 23 '18 at 03:49

32

Accuracy score is only for classification problems. For regression problems you can use: R2 Score, MSE (Mean Squared Error), RMSE (Root Mean Squared Error).

answered Jan 23 '18 at 03:49

Juan Jose Polanco Arias

553
1
7
11

Python/Scikit-Learn - Can't handle mix of multiclass and continuous

2 Answers2

Linked

Related