5

I have several datasets containing many x- and y-values. An example with a lot fewer values would look something like this:

data_set1:

x1          y1        
---------   ---------   
0           100
0.0100523   65.1077
0.0201047   64.0519
0.030157    63.0341
0.0402094   62.1309
0.0502617   61.3649
0.060314    60.8614
0.0703664   60.3555
0.0804187   59.7635
0.0904711   59.1787

data_set2:

x2          y2        
---------   ---------   
0           100
0.01        66.119
0.02        64.4593
0.03        63.1377
0.04        62.0386
0.05        61.0943
0.06        60.2811
0.07        59.5603
0.08        58.8908

So here I have (for this example) two data sets containing 10 x- and y-values. The y-values are always different, but in some cases the x-values will be the same, and sometimes they will be different - as in this case. Not by a lot, but still, they are different. Plotting these two data sets into a graph yields two different curves, and I would now like to make a mean curve of both. If the x-values were the same I would just take the mean of the y-values and plot them against the x-values, but as stated, they are sometimes different, and sometimes the same. Is there some way to extrapolate, or something like that, so that I could average the values (again, for many data sets) without "just guessing" or saying "they are pretty much the same, so it will be okay just to average the y-values". Extrapolation seems like a plausible way of doing this, but I have never played with it in python, and maybe there are even better ways to do this ?

Denver Dang
  • 2,433
  • 3
  • 38
  • 68

2 Answers2

7

If you have the same number of points in each dataset (the example you have doesn't, but you state in your post that you do), you could just get the mean of the respective x values from each set, and the mean of the respective y values. If you do not have the same number of values, you could follow the answers in this post

For example given your data, but with 9 points each:

>>> x1
array([0.       , 0.0100523, 0.0201047, 0.030157 , 0.0402094, 0.0502617,
       0.060314 , 0.0703664, 0.0804187])
>>> y1
array([100.    ,  65.1077,  64.0519,  63.0341,  62.1309,  61.3649,
        60.8614,  60.3555,  59.7635])
>>> x2
array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08])
>>> y2
array([100.    ,  66.119 ,  64.4593,  63.1377,  62.0386,  61.0943,
        60.2811,  59.5603,  58.8908])

You can do:

import numpy as np

mean_x = np.mean((x1,x2), axis=0)
mean_y = np.mean((y1,y2), axis=0)

And when to show visually, you can plot. Here, the black line is your mean line, and the blue and orange lines are your original datasets:

import matplotlib.pyplot as plt
plt.plot(x1,y1)
plt.plot(x2,y2)
plt.plot(mean_x,mean_y, color='black')
plt.show()

enter image description here

sacuL
  • 49,704
  • 8
  • 81
  • 106
4

If curves don't have same amount of points you can also plot a mean curve, using linear interpolation to equal the amount of points on all curves.

Lets say you need to plot the mean curve of a set of curves and you have xs and ys for all such curves, for which xs contains x coordinates for each curve and ys contains y coordinates. Mean curve's X axis would go from 0 to max(xs) and mean curve's Y axis is the mean of each y in ys, evaluated in each Xi ∈ X (i.e vertival-wise y(xi) for each y in ys, numpy's axis=0). Use interpolation for missing y values in ys.

mean_x_axis = [i for i in range(max(xs))]
ys_interp = [np.interp(mean_x_axis, xs[i], ys[i]) for i in range(len(xs))]
mean_y_axis = np.mean(ys_interp, axis=0)

plt.plot(mean_x_axis, mean_y_axis)

Mean curve example

roj4s
  • 251
  • 2
  • 8