-1

I'm trying to use numpy to curve fit (polyfit) a data set I have - it's multiple y vals for discrete x vals, i.e.: data = [[2, 3], [3, 4], [5, 4]] where the index is x, and the arrays are the y vals. I tried the average/median of each array, but I get the feeling that's ignoring a lot of useful data.

TLDR: Need to fit a curve to this scatter plot: enter image description here

yatu
  • 86,083
  • 12
  • 84
  • 139
user3730954
  • 57
  • 1
  • 9
  • Take a look at `np.polyfit` and vary the polynomial degree adequately. – cs95 Mar 12 '18 at 05:21
  • @cᴏʟᴅsᴘᴇᴇᴅ I can't find any options to pass in anything other than a y array that is the same shape as the x array. That is not what I'm looking for. – user3730954 Mar 12 '18 at 05:28
  • You could try and use https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html#scipy.optimize.curve_fit. This has a parameter `sigma` which accepts the covariance matrix of `y`-errors. That way the fit would use more of the information contained in your data set. – Paul Panzer Mar 12 '18 at 05:41
  • Looks like a series of [beta distributions](https://en.wikipedia.org/wiki/Beta_distribution). You could fit each discrete `x` to an `a,b` paramemter in y and fit the mean values with weight paramemters of inverse variance. But that's more a question for [stats.se]. Maybe ask there and if you have implementation problems you can ask here. – Daniel F Mar 12 '18 at 07:26
  • I honestly wouldn't `polyfit` this data at all, as it seems bounded on `[0,1]` in `y` which you can't enforce with `polyfit`. Probably want to fit to some inverse trig function. – Daniel F Mar 12 '18 at 07:36

1 Answers1

3

You could flatten your data out:

x = []
y = []
for i,ydata in enumerate(data):
    x += [i]*len(ydata)
    y += ydata

Now you can fit to x and y and it will account for all points in the set.

Farmer Joe
  • 6,020
  • 1
  • 30
  • 40