2

I have fit a LinearRegression() model. What I want to do now is basically calculate the distance between some data points and the regression line.

My datapoints are two dimensional points (x, y)

My question is: How can I get the equation of the line from the LinearRegression() model?

godot
  • 3,422
  • 6
  • 25
  • 42
Kristijan
  • 195
  • 10

2 Answers2

2

After you have fit the model, you can either call the coef and intercept_ attributes to see what the coefficients and the intercept are respectively.

But this would involve writing a constructed formula for your model. My recommendation is once you build your model, make the predictions and score it against the true y values -

from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred) # y_test are true values, y_pred are the predictions that you get by calling regression.predict()

If the goal is to calculate distances, you sklearn.metrics convenience functions instead of looking for the equation and hand-computing it yourself. The manual way to do that will be -

import numpy as np
y_pred = np.concatenate(np.ones(X_test.shape[0]), X_test) * np.insert(clf.coef_,0,clf.intercept_)
sq_err = np.square(y_pred - y_test)
mean_sq_err = np.mean(sq_err)
Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42
  • I'm basically looking for normal distance between the point and the line and obtain a score for it - since the data is 2D this is literally doing a poor man's PCA. – Kristijan Mar 06 '18 at 12:47
1

From the documentation, use clf.coef_ for the weight vector(s) and clf.intercept_ for the bias:

coef_ : array, shape (n_features, ) or (n_targets, n_features)
Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.

intercept_ : array Independent term in the linear model.

Once you have these, see here.

cs95
  • 379,657
  • 97
  • 704
  • 746