I have a large simulated data set in which I have passed through values and what not for an analysis. My main objective is to take actual, real record values and compare it the simulated data via cumulative distribution.
I start out by defining the method of going through each bin of the data set by taking values that have a certain value x
and match it to the "real" data analyzed with the same value x
bins = np.linspace(SimData.min(),SimData.max(), 24)
def CumuProb(SimData, bins, x, realValue):
h, bins_ = np.histogram(be, bins=bins)
hcum = np.cumsum(h)/float(np.cumsum(h).max())
cbins = np.zeros(len(bins)+1)
cbins[1:-1] = bins[1:]-np.diff(bins[:2])[0]/2.
cbins[-1] = bins[-1]
hcumc = np.linspace(0,1, len(cbins))
hcumc[1:-1] = hcum
p = [x, realValue]
yi = np.interp(p[1],cbins, hcumc)
return [p[1],yi]
This method works for large values fine. But, if I were to pass this through values <<1 but >0, this miserably fails.
For example, performing, on my project using this method gives:
Where you can see at the very bottom, their is 2 points, when their should be about 10 points all on the blue line (the actual data).
The main culprit is found from this traceback:
RuntimeWarning: invalid value encountered in divide hcum = np.cumsum(h)/float(np.cumsum(h).max())
So this has to do how I am most likely defining my bin size, which is defined at bin=np.linspace(np.log(binding).min(),np.log(binding).max(),24)
, which is going through the logarithmic x-axis values in the plot above for binning.
How do I fix this?