I am new to sklearn and I have an appropriately simple task: given a scatter plot of 15 dots, I need to
- Take 11 of them as my 'training sample',
- Fit a polynomial curve of degree 3 through these 11 dots;
- Plot the resulting polynomial curve over the 15 dots.
But I got stuck at the second step.
This is the data plot:
%matplotlib notebook
import numpy as np from sklearn.model_selection
import train_test_split from sklearn.linear_model
import LinearRegression from sklearn.preprocessing import PolynomialFeatures
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
plt.figure() plt.scatter(X_train, y_train, label='training data')
plt.scatter(X_test, y_test, label='test data')
plt.legend(loc=4);
I then take the 11 points in X_train
and transform them with a poly features of degree 3 as follow:
degrees = 3
poly = PolynomialFeatures(degree=degree)
X_train_poly = poly.fit_transform(X_train)
Then I try to fit a line through the transformed points (note: X_train_poly.size
= 364).
linreg = LinearRegression().fit(X_train_poly, y_train)
and I get the following error:
ValueError: Found input variables with inconsistent numbers of samples: [1, 11]
I have read various questions that address similar and often more complex problems (e.g. Multivariate (polynomial) best fit curve in python?), but I could not extract a solution from them.