PYTHON: line of best fit for multiple y values per x value

Question

I am plotting a 1d array (x-axis) against a 2d array (y-axis)in matplotlib so there are multiple y values for each x value. I want to plot a straigt line of best fit (linear regression), not just a line joining the points. How can I do this???

All the otehr examples seem to only have one y value per x value. When I use 'from sklearn.linear_model import LinearRegression' I get as many best fit lines as there are y values per x value.

EDIT: here is the code I have tried:

model = LinearRegression()
x_axis2 = np.arange(0,len(av_rsq3))
x_axis2 = x_axis2.reshape(-1,1)
model.fit(x_axis2, av_rsq3)
pt.figure()
pt.plot(x_axis2,av_rsq3, 'rx')
pt.plot(x_axis2, model.predict(x_axis2))

note: x_axis2 is a 1d array and av_rsq3 is a 2d array.

Can you post the code you've tried so far? It would help diagnose the problem. My first instinct is that `LinearRegression` is designed to fit the _X_ values to predict the _y_ value, so you seem to have inverted the problem — G. Anderson, Oct 29 '18 at 16:23
My point remains, you are trying to 'predict' multiple y-values with a single x-value, which isn;t how sklearn linear regression works. In this case, you would need to plot each set of x-vals against the y-val independently. If you want a single line fit to the points, you will need to do more engineering to combine your x-values — G. Anderson, Oct 29 '18 at 16:37

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

You just need to add these points with matching x-values as normal points, then you can add a line of best fit as follows:

import numpy as np
from numpy.polynomial.polynomial import polyfit
import matplotlib.pyplot as plt

x = np.array([1,2,3,4,5,6,6,6,7,7,8])
y = np.array([1,2,4,8,16,32,34,30,61,65,120])

# Fit with polyfit
b, m = polyfit(x, y, 1)

plt.plot(x, y, '.')
plt.plot(x, b + m * x, '-')
plt.show()

which produces .
Note, a straight line doesn't fit my example data, but I didn't think about that when writing it :) With polyfit you are also able to change the degree of the fit, as well as obtain error margins in gradients* and offsets.

* (or other polynomial coefficients)

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 29 '18 at 16:41

Namyts

373
1
9

sorry what do you mean add them as normal points? Your data is different to mine in that is a one-to-one mapping, mine is a one-to-five mapping. – Kai Mason Oct 29 '18 at 16:48
I meant the same thing as what Peter Valtersson said. If you want to add more y values per x value, just add another identical x to the list of x's and then add the y value to the list of y values. In my test data I added a few extra y values for x=6, and x=7. This will work just fine for your one-to-five mapping :) – Namyts Oct 29 '18 at 16:58
@KaiMason So for your 1-to-5 you would have something like x=np.array(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4) and y=np.array(1,1,0.9,1,1,2,2.1,2,2.3,2,4.5,4,4,4,3.7,8,8.1,8,7.8,8). The order in which you give your coordinates doesn't matter, as long as the mapping between the lists is correct. – Namyts Oct 29 '18 at 17:06
Thanks very much, I do understand but the nature of the body of my code makes it quite difficult to rearrange my arrays like this. Is there not some other package that would do the job better and allow me to keep my arrays more tidy? I much prefer to organise my y values in a 2d array. – Kai Mason Oct 29 '18 at 17:07
One way you could get 5 times as many x values would be to use itertools chain function. Something like np.array(list(chain(*([x,x,x,x,x] for x in xs)))) where xs is your list of x values. – Namyts Oct 29 '18 at 17:17
You can also use chain to join up your y values in a similar way... Organise it how you want in your processing, plotting is the final step. Ideally try and make the conversion to the new format required by the plotter work as an iterator for maximum memory efficiency. i.e avoid converting to a list like I did ;) if performance is a worry. – Namyts Oct 29 '18 at 17:27
1

Thanks very much, big help, think I've just about got it working now! – Kai Mason Oct 29 '18 at 17:56

score 0 · Answer 2 · answered Oct 29 '18 at 16:31

0

What you need to do is provide a one to one mapping. The order the points appear in does not matter. So if you have something like this

X:  [1,2,3,4]
Y1: [4,6,2,7]
Y2: [2,3,6,8]

you would get this

X: [1,2,3,4,1,2,3,4]
Y: [4,6,2,7,2,3,6,8]

answered Oct 29 '18 at 16:31

Peter Valtersson

31
2

Thanks, what I have tried to do to fix it is take the mean of the y values at each x value and then use that mean for the linear regression. Should this not also work? – Kai Mason Oct 29 '18 at 16:35
That would work, but it would not give you the correct answer. – Peter Valtersson Oct 29 '18 at 16:59

score 0 · Answer 3 · answered Oct 29 '18 at 16:48

0

If you just want to plot the y values and a line averaging between them, this is possible. Borrowing the dummy data from another answer:

x = [1,2,3,4]

y = [4,6,2,7]
y1 = [2,3,6,8]

plt.scatter(x,y)
plt.scatter(x,y1)
plt.plot(x,[((y[i]+y1[i])/2) for i in range(len(y))])

answered Oct 29 '18 at 16:48

G. Anderson

5,815
2
14
21

No I don't want an average line I want to perform a linear regression, so I want a straight line of best fit. – Kai Mason Oct 29 '18 at 16:50

PYTHON: line of best fit for multiple y values per x value

3 Answers3