0

I want to have an input(x) of 5 points and an output of the same size(y). after that, I should fit a curved line into the dataset. finally, I should use matplotlib to draw the curved line and the points in order to show a non-linear regression. I want to fit a single curve to my dataset of 5 points .but it does not seem to work. it is simple but I'm new to sklearn. do you know what is wrong with my code? here is the code:

#here is the dataset of 5 points
x=np.random.normal(size=5)
y=2.2*x-1.1
y=y+np.random.normal(scale=3,size=y.shape)
x=x.reshape(-1,1)
#i use polynomialfeatures module because I want extra dimensions 
preproc=PolynomialFeatures(degree=4)
x_poly=preproc.fit_transform(x)
#in this part I want to make 100 points to feed it to a polynomial and after that i can draw a curve .
x_line=np.linspace(-2,2,100)
x_line=x_line.reshape(-1,1)
#at this point i made y_hat inorder to have values of predicted y.
poly_line=PolynomialFeatures(degree=4)
x_feats=poly_line.fit_transform(x_line)
y_hat=LinearRegression().fit(x_feats,y).predict(x_feats)

plt.plot(y_hat,y_line,"r")
plt.plot(x,y,"b.")
Mo.be
  • 95
  • 2
  • 11
  • If you comment your code, it’ll be a lot easier for us to know what you think you’re doing. It’s a good habit. – Arya McCarthy Feb 22 '21 at 07:21
  • You went down a rabbit hole here. Could you explain how you thought this was going to work? – Arya McCarthy Feb 22 '21 at 07:22
  • 1
    Try to provide more information when possible as Arya said above. What I can see easily, is that when you are fitting the `LinearRegression()`, len of `x_feats` and `y` is quite different (100 vs 5). They must match. – Alex Serra Marrugat Feb 22 '21 at 07:31
  • I did comment on my code .sorry for that. @AryaMcCarthy – Mo.be Feb 22 '21 at 07:37
  • how can I fix it? I used np.reshape(),np.matmul() etc. but it does not seem to work.@AlexSerraMarrugat – Mo.be Feb 22 '21 at 07:38
  • You talk about a curved line and nonlinear regression. Yet your function is `y=2.2x-1.1`, which is a line. Are you sure that you want nonlinear regression? – joostblack Feb 22 '21 at 07:38
  • @joostblack This is linear regression—because the output is still a linear function of the features. It’s just that the features come from a basis expansion with the PolynomialFeatures. – Arya McCarthy Feb 22 '21 at 07:40
  • @joostblack.yes .that's the way our teacher asked us to do it. the PolynomialFeatures(degree=4) is for making the polynomials of x .so we have a non-linear regression (I guess it should be this, as our teacher told us) – Mo.be Feb 22 '21 at 07:42
  • For the reason I just mentioned—it’s still linear regression, – Arya McCarthy Feb 22 '21 at 07:42
  • yes you are right about y=2.2x-1.1. its output is linear . I'm confused because of my teacher. @AryaMcCarthy – Mo.be Feb 22 '21 at 07:45
  • No, that’s not why it’s linear regression. – Arya McCarthy Feb 22 '21 at 07:47
  • then why is it linear ?@AryaMcCarthy – Mo.be Feb 22 '21 at 07:48
  • can you give it a try in your IDE? @AryaMcCarthy – Mo.be Feb 22 '21 at 07:49
  • You’re confusing two things: the true data distribution (which is a linear function) and the model of it. The model is linear regression. It’s a linear model because its output is a linear function of its parameters. – Arya McCarthy Feb 22 '21 at 07:52
  • ok, I get it (the linear part). can you tell me how I fit a curved line? because that's the assignment our teacher wants.I appreciate it if you help me with this. @AryaMcCarthy – Mo.be Feb 22 '21 at 07:55

1 Answers1

2

First of all, your are having a LinearRegression problem. As joostblack and Arya commented, you equation is y=2.2x-1.1, this is linear. Why you need polynomial features?

Anyway, if you need to do this task because you have been asked, here you have code that can work:

x=np.random.normal(size=5)
y=2.2*x-1.1

mymodel = numpy.poly1d(numpy.polyfit(x, y, 4))

myline = numpy.linspace(-2, 2, 100)

plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()

enter image description here

As we commented, is "silly" to fit a linear problem with polyonomalia degree 4 because wi will always a linear regression as a solution. It can be useful if you have another relation like that: y=x**3+x-2 (this is not linear as you can see):

np.random.seed(0)
x=np.random.normal(size=5)

y=x**3+x-2

mymodel = numpy.poly1d(numpy.polyfit(x, y, 4))

myline = numpy.linspace(-2, 3, 100)

plt.scatter(x, y)
plt.plot(myline, mymodel(myline))
plt.show()

enter image description here

Two final comments. You have to differentiate what is a LinearRegression and Polyonomial, and in which case they are useful. Second, I used numpy to solve your problem, not sklearn, it's more simple for your problem, be aware of that.

Alex Serra Marrugat
  • 1,849
  • 1
  • 4
  • 14
  • thank you. this helped a lot. but I should use Polynomialfeatures of sklearn . This was mandatory and requested by the teacher to use polynomialfeatures. but your script and other comments helped me a lot. so I should give it a try. – Mo.be Feb 22 '21 at 08:10
  • 1
    ohh, sad it was not what you were looking for. Check the second answer of this [link](https://stackoverflow.com/questions/51906274/cannot-understand-with-sklearns-polynomialfeatures). Maybe it can help you a little bit. – Alex Serra Marrugat Feb 22 '21 at 08:32
  • 1
    Also take a look on this [link](https://towardsdatascience.com/polynomial-regression-with-scikit-learn-what-you-should-know-bed9d3296f2). – Alex Serra Marrugat Feb 22 '21 at 08:41