0

I was doing the modeling on the House Pricing dataset. My target is to get the mse result and predict with the input variable

I'm doing the modeling with scaling the data using MinMaxSclaer(), and the model is trained with LinearRegression(). After this I got the score, mse, mae, dan rmse result.

But when I want to predict it with the actual result. It got scaled, how to predict the after result with the actual price?

This is my script:

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error

train = pd.read_csv('train.csv')

column = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'YearBuilt']

train = train[column]

# Convert Feature/Column with Scaler
scaler = MinMaxScaler()
train[column] = scaler.fit_transform(train[column])

X = train.drop('SalePrice', axis=1)
y = train['SalePrice']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15)

# Calling LinearRegression
model = LinearRegression()

# Fit linearregression into training data
model = model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Calculate MSE (Lower better)
mse = mean_squared_error(y_test, y_pred)
print("MSE of testing set:", mse)

# Calculate MAE
mae = mean_absolute_error(y_test, y_pred)
print("MAE of testing set:", mae)

# Calculate RMSE (Lower better)
rmse = np.sqrt(mse)
print("RMSE of testing set:", rmse)

# Predict the Price House by input:
overal_qual = 6
grlivarea = 1217
garage_cars = 1
totalbsmtsf = 626
fullbath = 1
year_built = 1980

predicted_price = model.predict([[overal_qual, grlivarea, garage_cars, totalbsmtsf, fullbath, year_built]])
print("Predicted price:", predicted_price)

The result:

MSE of testing set: 0.0022340806066149734
MAE of testing set: 0.0334447655149599
RMSE of testing set: 0.04726606189027147

Predicted price: [811.51843959]

Where the price is should be for example 208500, 181500, or 121600 with grands value in $.

What step I missed here?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
MADFROST
  • 1,043
  • 2
  • 11
  • 29
  • Please update your post according to SO standard. At least create a [mcve]. Mainly we miss input dataset and training procedure to be able to reproduce and analyse your issue. – jlandercy Oct 01 '22 at 06:36
  • @jlandercy I'm sorry, I've update the script – MADFROST Oct 01 '22 at 06:40
  • Much better but still missing the dataset. – jlandercy Oct 01 '22 at 07:12
  • @jlandercy My dataset was from here https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques – MADFROST Oct 01 '22 at 07:17
  • Keep in mind that first we split, and only afterward we scale (using `fit_transform` for he training data but `transform` for the test). – desertnaut Oct 01 '22 at 11:08
  • @desertnaut Hi! I still didn't get it, So, if I want to predict with selected variable, what step I need to do? In your reference, I think it only get the MSE only, not the selected predicted with the input like I put on `# Predict the Price House by input:`. My predicted got 811, which is not the real value (Real value got the hundred thousand value like $250000. – MADFROST Oct 01 '22 at 12:37
  • Please look closer at the arguments of the second `mean_squared_error` function used there; each scikit-learn scaler comes with an `.inverse_transform()` method, which you should use in order to get your `y_pred` back to the scale of your original `y` data, but as already noticed you should scale *after* you split, i.e. scale `y_train` and `y_test` separately (with `.fit_transform` and `transform`, respectively). – desertnaut Oct 01 '22 at 18:29

0 Answers0