4

I am trying to predict wine quality (ranges from 1 to 10) using regression models such as linear,SGDRegressor, ridge,lasso.

dataset:http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv

Independent values:volatile acidity,residual sugar,free sulfur dioxide,total sulfur dioxide,alchohol Dependent:Quality

Linear model

regr = linear_model.LinearRegression(n_jobs=3)
regr.fit(x_train, y_train)
predicted = regr.predict(x_test)

predicted values for LinearRegression array([ 5.33560542, 5.47347404, 6.09337194, ..., 5.67566813, 5.43609198, 6.08189 ])

predicted values are in float instead of (1,2,3...10) I tried to round predicted values using numpy

predicted = np.round(regr.predict(x_test))` but my accuracy gone down with this attempt.

SGDRegressor model.

from sklearn import linear_model
np.random.seed(0)
clf = linear_model.SGDRegressor()
clf.fit(x_train, y_train)
redicted = np.floor(clf.predict(x_test))

predicted output values for SGDRegressor:

array([ -2.77685458e+12,   3.26826414e+12,   4.18655713e+11, ...,
     4.72375220e+12,  -7.08866307e+11,   3.95571514e+12])

Here I am unable to convert the output values into integers.

Could someone please let me know the best way to predict the wine quality using these regression models.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
Praneeth
  • 313
  • 4
  • 9
  • Have you normalised the data between 0 and 1? or sometimes depending on the regression between -1 and 1 – pbu Jun 18 '15 at 12:50
  • Maybe this is a classification problem? – Chung-Yen Hung Jun 18 '15 at 13:06
  • As a part of academic assignment, we have to use both classifications(to classify wine based upon the quality) and regression models (to predict the quality of the wine) @Chung-YenHung do you think is there any alternative or am I missing any other measures? – Praneeth Jun 18 '15 at 13:15
  • @pbu I haven't normalised the data. do you think normalising the data results in the output quality to integer form? – Praneeth Jun 18 '15 at 13:20
  • You can reformulate the "regression" problem of wine grade prediction as a 45-way classification problem, see e.g., http://stackoverflow.com/questions/9041753/multi-class-classification-in-libsvm but that might be more expensive than living with float predictions. **But** what do you mean your prediction error went down? When your target values are integers, and you compare them against predictions that are int/float, you *must* use different success metrics! – Ahmed Fasih Jun 18 '15 at 13:43
  • 1
    The SGDRegressor result seems totally worthless, why is it predicting negative and ~1e12 for wine scores?! I haven't looked at its documentation, are you sure you don't need to pass in some parameters? Or, *maybe* normalizing your input features to zero-mean-unit-variance (or something like that) will help this. – Ahmed Fasih Jun 18 '15 at 13:47

1 Answers1

4

You are doing a regression and therefore the output is continuous in nature.

The thing you should note is that your mini-project on predicting wine quality is not a classification problem. The response variable y, the wine quality, has intrinsic order which means a score of 6 is strictly better than a score of 5. It is NOT categorical variable where different numbers just represent different groups where groups are non-comparable.

Jianxun Li
  • 24,004
  • 10
  • 58
  • 76