Error when fitting elements found in data to its cumulative distribution

Question

I have a large simulated data set in which I have passed through values and what not for an analysis. My main objective is to take actual, real record values and compare it the simulated data via cumulative distribution.

I start out by defining the method of going through each bin of the data set by taking values that have a certain value x and match it to the "real" data analyzed with the same value x

bins = np.linspace(SimData.min(),SimData.max(), 24)

def CumuProb(SimData, bins, x, realValue):
    h, bins_ = np.histogram(be, bins=bins)
    hcum = np.cumsum(h)/float(np.cumsum(h).max())

    cbins = np.zeros(len(bins)+1)
    cbins[1:-1] = bins[1:]-np.diff(bins[:2])[0]/2.
    cbins[-1] = bins[-1]

    hcumc = np.linspace(0,1, len(cbins))
    hcumc[1:-1] = hcum

    p = [x, realValue]

    yi = np.interp(p[1],cbins, hcumc)
    return [p[1],yi]

This method works for large values fine. But, if I were to pass this through values <<1 but >0, this miserably fails.

For example, performing, on my project using this method gives:

Where you can see at the very bottom, their is 2 points, when their should be about 10 points all on the blue line (the actual data).

The main culprit is found from this traceback: RuntimeWarning: invalid value encountered in divide hcum = np.cumsum(h)/float(np.cumsum(h).max())

So this has to do how I am most likely defining my bin size, which is defined at bin=np.linspace(np.log(binding).min(),np.log(binding).max(),24), which is going through the logarithmic x-axis values in the plot above for binning.

How do I fix this?

Check out [qq- or probability plots](http://stackoverflow.com/questions/13865596/quantile-quantile-plot-using-scipy). Although this is not what you *want*, it is probably what you *should* do if you want to compare data to distributions. — MB-F, Apr 05 '17 at 06:47

score 0 · Answer 1 · edited May 23 '17 at 11:46

0

I can't be 100% sure, since the question lacks a lot of relevant information needed, but judging from how I intended to use this function, it seems odd to put realValue into the interpolation. If, what the name suggests, x is the x axis value of the data point to be investigated, the interpolation should take x in:

yi = np.interp(x,cbins, hcumc)
return [x,yi]

edited May 23 '17 at 11:46

Community

1
1

answered Apr 05 '17 at 08:32

ImportanceOfBeingErnest

321,279
53
665
712

Error when fitting elements found in data to its cumulative distribution

1 Answers1