0

I have used load_boston dataset from sklearn and Linear Regression. The code:

from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
%matplotlib inline
from sklearn.model_selection import train_test_split, KFold,cross_val_score,cross_validate
from sklearn.linear_model import LinearRegression

#Loading the dataset
x = load_boston()
df = pd.DataFrame(x.data, columns = x.feature_names)
df["MEDV"] = x.target
X = df.drop("MEDV",1)   #Feature Matrix
y = df["MEDV"]          #Target Variable
df.head()

linear = LinearRegression()
X_train,X_test, y_train,y_test = train_test_split(X,y, random_state = 11)
linear.fit(X_train,y_train)

kfold = KFold(n_splits=5, random_state=11, shuffle=True)
scores = cross_val_score(estimator= linear,cv=kfold, X=X, y = y, )# if scoring= "accuracy": error 

#>ValueError: continuous is not supported

print(f"Mean Accuracy: {scores.mean():.2%} and standard deviation: {scores.std():.2%}")

If I use scoring= "accuracy" in the cross_val_score, it rises a error:

ValueError: continuous is not supported

What is happening?

Laurinda Souza
  • 1,207
  • 4
  • 14
  • 29
  • Accuracy is a classification metric, and it is just meaningless in regression settings, hence the error; see similar situation here: https://stackoverflow.com/questions/38015181/accuracy-score-valueerror-cant-handle-mix-of-binary-and-continuous-target – desertnaut Apr 21 '20 at 14:01
  • 1
    Your problem is also regression, and not classification – desertnaut Apr 21 '20 at 14:06
  • You are using `LinearRegression()` in your code. I'm not sure how you can say you are doing classification. – Mihai Chelaru Apr 21 '20 at 14:07

1 Answers1

2

The accuracy does not work here since it is a metric aimed at classification problems. Namely it is:

  • Number of correct predictions / Total number of predictions

By not setting it it works fine, since it defaults to the underlying estimator's scoring, which is the R^2 score for a LinearRegression, which is a scoring you should be looking at for a regression problem.

You can have a look at the different scoring types supported in sklearn and for what problems they are appropriate:

yatu
  • 86,083
  • 12
  • 84
  • 139