3

I'm trying to simply plot a regression line, however I get messy lines. Is it because I fitted the model with 2 features, so the only appropriate visualization would be a 3d plane?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# prepare data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)[['AGE','RM']]
y = boston.target

# split dataset into training and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=33)

# apply linear regression on dataset
lm = LinearRegression()
lm.fit(X_train, y_train)
pred_train = lm.predict(X_train)
pred_test = lm.predict(X_test)

#plot relationship between RM and price
plt.scatter(X_train['RM'],
            y_train,
            c='g',
            s=40,
            alpha=0.5)
plt.plot(X_train['RM'], pred_train, color='r')
plt.title('Relationship between RM and Price')
plt.ylabel('Price')
plt.xlabel('RM')

enter image description here

Alv123
  • 431
  • 6
  • 18

2 Answers2

4

You are right. You are training on multiple features, i.e AGE, and RM. But you are plotting a 2D plot with only one feature, i.e RM. Try to get a 3D plot. In general, linear regression with two features results in a plane. This is still a linear regression. That is why we use the term "hyperplane". It resolves to a line for a single feature, a plane for two features and so on.

Here is the output in 3D:

plt3d = plt.figure().gca(projection='3d')
plt3d.view_init(azim=135)
plt3d.plot_trisurf(X_train['RM'].values, X_train['AGE'].values, pred_train, alpha=0.7, antialiased=True)

enter image description here

Sushant
  • 122
  • 11
0

The problem is that when you plot you have to order the arguments.

'plt.plot(np.sort(X_train['RM']), np.sort(pred_train), color='r')'

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

# prepare data
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)[['AGE','RM']]
y = boston.target

# split dataset into training and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=33)

# apply linear regression on dataset
lm = LinearRegression()
lm.fit(X_train, y_train)
pred_train = lm.predict(X_train)
pred_test = lm.predict(X_test)

#plot relationship between RM and price
plt.scatter(X_train['RM'],
            y_train,
            c='g',
            s=40,
            alpha=0.5)
plt.plot(np.sort(X_train['RM']), np.sort(pred_train), color='r')
plt.title('Relationship between RM and Price')
plt.ylabel('Price')
plt.xlabel('RM')
plt.show()

the result: output-plot

Probably if you do a 3d plot you will visualize easily the relation between the co-variables RM and age 3d-plot

  • Here is a stackoverflow question with my answer giving Python code for surface fitting with 3D scatter plot, 3D surface plot, and contour plot: https://stackoverflow.com/questions/55030369/python-surface-fitting-of-variables-of-different-dimensionto-get-unknown-paramet – James Phillips Dec 14 '19 at 19:05