9

I am trying to predict car prices (by machine learning) with a simple linear regression (only one independent variable). The variables are "highway miles per gallon"

0      27
1      27
2      26
3      30
4      22
       ..
200    28
201    25
202    23
203    27
204    25
Name: highway-mpg, Length: 205, dtype: int64

and "price":

0      13495.0
1      16500.0
2      16500.0
3      13950.0
4      17450.0
        ...   
200    16845.0
201    19045.0
202    21485.0
203    22470.0
204    22625.0
Name: price, Length: 205, dtype: float64

With the following code:

from sklearn.linear_model import LinearRegression

x = df["highway-mpg"]
y = df["price"]
lm = LinearRegression()

lm.fit([x],[y])
Yhat = lm.predict([x])

print(Yhat)
print(lm.intercept_)
print(lm.coef_)

However, the intercept and slope coefficient print commands give me the following output:

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]

Why doesn't it print the intercept and slope coefficient? The "Yhat" print command does print out the predicted values in an array properly, but somehow the other print commands do not print my desired output...

Viol1997
  • 153
  • 1
  • 2
  • 13
  • 1
    Just wondering, why are putting extra square brackets around `x` and `y`? – iz_ Jan 14 '20 at 19:27
  • @iz_ because of this https://stackoverflow.com/questions/45554008/error-in-python-script-expected-2d-array-got-1d-array-instead – Viol1997 Jan 14 '20 at 19:29
  • What happens if you take the brackets off in both the `fit` and `predict` lines? – iz_ Jan 14 '20 at 19:30
  • @iz_ it gives me the "ValueError: Expected 2D array, got 1D array instead:" error, which is why I used the [x] method – Viol1997 Jan 14 '20 at 19:32
  • What you're doing seems very odd to me. Can you post what `x` and `y` look like? – iz_ Jan 14 '20 at 19:36
  • @iz_ just edited that in! – Viol1997 Jan 14 '20 at 19:42
  • I'm *pretty* sure what's happening right now is not what you intend. I'll spin something up and post it as an answer. – iz_ Jan 14 '20 at 19:44
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/205962/discussion-between-iz-and-vincent-olivers). – iz_ Jan 14 '20 at 19:46

1 Answers1

20

Essentially, what caused the strange looking coef_ and intercept_ was the fact that your data had 205 features and 205 targets with only 1 sample. Definitely not what you wanted!

You probably want 1 feature, 205 samples, and 1 target. To do this, you need to reshape your data:

from sklearn.linear_model import LinearRegression
import numpy as np

mpg = np.array([27, 27, 26, 30, 22, 28, 25, 23, 27, 25]).reshape(-1, 1)
price = np.array([13495.0, 16500.0, 16500.0, 13950.0, 17450.0, 16845.0, 19045.0, 21485.0, 22470.0, 22625.0])

lm = LinearRegression()
lm.fit(mpg, price)

print(lm.intercept_)
print(lm.coef_)

I used the arrays there for testing, but obviously use the data from your dataframe.

P.S. If you omit the resize, you get an error message like this:

ValueError: Expected 2D array, got 1D array instead:
array=[27 27 26 30 22 28 25 23 27 25].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

^ It tells you what to do!

iz_
  • 15,923
  • 3
  • 25
  • 40