0

I built a simple linear regression model to predict students' final grade using this dataset https://archive.ics.uci.edu/ml/datasets/Student+Performance.

While my accuracy is very good, the errors seem to be big.

enter image description here

I'm not sure if I'm just not understanding the meaning of the errors correctly or if I made some errors in my code. I thought for the accuracy of 92, the errors should be way smaller and closer to 0.

Here's my code:

data = pd.read_csv("/Users/.../student/student-por.csv", sep=";")

X = np.array(data.drop([predict], 1))
y = np.array(data[predict]) 

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1, random_state=42)

linear = linear_model.LinearRegression()

linear.fit(x_train, y_train)

linear_accuracy = round(linear.score(x_test, y_test) , 5)

linear_mean_abs_error = metrics.mean_absolute_error(y_test, linear_prediction)
linear_mean_sq_error = metrics.mean_squared_error(y_test, linear_prediction)
linear_root_mean_sq_error = np.sqrt(metrics.mean_squared_error(y_test, linear_prediction))

Did I make any errors in the code or errors do make sense in this case?

natpas
  • 23
  • 1
  • 6
  • 1
    An RMSE of 0.78 is good. That is the kind of RMSE you get for an accuracy of 92%. – vbhargav875 May 21 '20 at 00:17
  • @vbhargav875 Accuracy is used only in classification problems, and it is not meaningful in regression ones. And contrary to the accuracy (which is a percentage), there is no way to say if a particular value of RMSE, MSE, or MAE is "good" or not in itself, as they always depend critically on the scale of the dependent variable. – desertnaut May 21 '20 at 13:59

1 Answers1

1

The accuracy metric in sklearn linear regression is the R^2 metric. It essentially tells you the percent of the variation in the dependent variable explained by the model predictors. 0.92 is a very good score, but it does not mean that your errors will be 0. I looked your work and it seems that you used all the numeric variables as your predictors and your target was G3. The code seems fine and the results seem accurate too. In regression tasks it is really hard to get 0 errors. Please let me know if you have any questions. Cheers

griggy
  • 446
  • 4
  • 7
  • The term "accuracy" itself is reserved for classification problems only, and it is not used in regression ones. Regarding performance metrics, in predictive tasks practically metrics like MSE, RMSE, and MAE are always preferred over R^2; see last part of own answer in [scikit-learn & statsmodels - which R-squared is correct?](https://stackoverflow.com/questions/54614157/scikit-learn-statsmodels-which-r-squared-is-correct/54618898#54618898) – desertnaut May 21 '20 at 13:55