0

y1 is a numpy.ndarray with length 106 (representing heights measured in meters)

x1 is a numpy.ndarray with length 106 (representing ages of the boys corresponding to the heights)

I'm trying to predict heights given ages with linear regression using gradient descent, and then to plot it as a 3D surface plot.

When I try to do .fit(), it tells me

ValueError: Found arrays with inconsistent numbers of samples: [ 1 106]

import numpy as np
from sklearn import linear_model

x1 = np.fromfile('ex2x.dat', float)
y1 = np.fromfile('ex2y.dat', float)

clf = linear_model.SGDRegressor(alpha=.007)

clf.fit(x1, y1)


y_predicted = clf.predict(3.5)
Nate
  • 466
  • 5
  • 23

1 Answers1

2

The expected array shapes are:

  • (n_samples, 1) for x1, meaning a 2D array with one column (since you have one feature)
  • (n_samples,) for y1, meaning a 1D array.

If your first array is 1D, you should reshape it:

x1 = np.fromfile('ex2x.dat', float).reshape(-1, 1)

Here is a small self-contained example:

import numpy as np
from sklearn import linear_model

x1 = np.array(range(10)).reshape(-1, 1)
y1 = np.array([k**.5 for k in range(10)])

clf = linear_model.SGDRegressor(alpha=.0007)
clf.fit(x1, y1)

y_predicted = clf.predict(3.5)