0

I have a large simulated data set in which I have passed through values and what not for an analysis. My main objective is to take actual, real record values and compare it the simulated data via cumulative distribution.

I start out by defining the method of going through each bin of the data set by taking values that have a certain value x and match it to the "real" data analyzed with the same value x

bins = np.linspace(SimData.min(),SimData.max(), 24)

def CumuProb(SimData, bins, x, realValue):
    h, bins_ = np.histogram(be, bins=bins)
    hcum = np.cumsum(h)/float(np.cumsum(h).max())

    cbins = np.zeros(len(bins)+1)
    cbins[1:-1] = bins[1:]-np.diff(bins[:2])[0]/2.
    cbins[-1] = bins[-1]

    hcumc = np.linspace(0,1, len(cbins))
    hcumc[1:-1] = hcum

    p = [x, realValue]

    yi = np.interp(p[1],cbins, hcumc)
    return [p[1],yi]

This method works for large values fine. But, if I were to pass this through values <<1 but >0, this miserably fails.

For example, performing, on my project using this method gives:

enter image description here

Where you can see at the very bottom, their is 2 points, when their should be about 10 points all on the blue line (the actual data).

The main culprit is found from this traceback: RuntimeWarning: invalid value encountered in divide hcum = np.cumsum(h)/float(np.cumsum(h).max())

So this has to do how I am most likely defining my bin size, which is defined at bin=np.linspace(np.log(binding).min(),np.log(binding).max(),24), which is going through the logarithmic x-axis values in the plot above for binning.

How do I fix this?

iron2man
  • 1,787
  • 5
  • 27
  • 39
  • Check out [qq- or probability plots](http://stackoverflow.com/questions/13865596/quantile-quantile-plot-using-scipy). Although this is not what you *want*, it is probably what you *should* do if you want to compare data to distributions. – MB-F Apr 05 '17 at 06:47

1 Answers1

0

I can't be 100% sure, since the question lacks a lot of relevant information needed, but judging from how I intended to use this function, it seems odd to put realValue into the interpolation. If, what the name suggests, x is the x axis value of the data point to be investigated, the interpolation should take x in:

yi = np.interp(x,cbins, hcumc)
return [x,yi]
Community
  • 1
  • 1
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712