2

I have performed a ridge regression model on a data set (link to the dataset: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data) as below:

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

y = train['SalePrice']
X = train.drop("SalePrice", axis = 1)

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30)
ridge = Ridge(alpha=0.1, normalize=True)
ridge.fit(X_train,y_train)
pred = ridge.predict(X_test)

I calculated the MSE using the metrics library from sklearn as

from sklearn.metrics import mean_squared_error
mean = mean_squared_error(y_test, pred) 
rmse = np.sqrt(mean_squared_error(y_test,pred)

I am getting a very large value of MSE = 554084039.54321 and RMSE = 21821.8, I am trying to understand if my implementation is correct.

user2480288
  • 619
  • 2
  • 11
  • 28
  • 1
    Please provide code for your `mean_sqaured_error` and RMSE, how you split the data, what's the value of MSE you obtain and link/description of your dataset. – Szymon Maszke Feb 10 '19 at 04:28
  • Possible duplicate of [Root mean square error in python](https://stackoverflow.com/questions/17197492/root-mean-square-error-in-python) – ndrwnaguib Feb 10 '19 at 07:33
  • @SzymonMaszke I have updated the question with code – user2480288 Feb 10 '19 at 18:16

2 Answers2

3

RMSE implementation

Your RMSE implementation is correct which is easily verifiable when you take the sqaure root of sklearn's mean_squared_error.

I think you are missing a closing parentheses though, here to be exact:

rmse = np.sqrt(mean_squared_error(y_test,pred)) # the last one was missing

High error problem

Your MSE is high due to model not being able to model relationships between your variables and target very well. Bear in mind each error is taken to the power of 2, so being 1000 off in price sky-rockets the value to 1000000.

You may want to modify the price with natural logarithm (numpy.log) and transform it to log-scale, it is a common practice especially for this problem (I assume you are doing House Prices: Advanced Regression Techniques), see available kernels for guidance. With this approach, you will not get such big values.

Last but not least, check Mean Absolute Error in order to see your predictions are not as terrible as they seem.

Szymon Maszke
  • 22,747
  • 4
  • 43
  • 83
0

It's also possible to change 'squared' parameter.

squared: bool, default=True If True returns MSE value, if False returns RMSE value.