I have a problem sorting list items to bins. I have two lists, X and Y, with corresponding X and Y values (which could also be one list of tuples, obviously). Next, I need to split the X range in 10 equal bins and sort the X values and corresponding Y values to those bins, so that I know what Y values belong to which X bin (i.e. into which bin falls the X value of each Y value), and then take the median of all Y values in each bin. This gives me ten bin-median pairs. This is working fine in principle with the following code in which I also calculate the X-center of each bin.
bins = np.linspace(max(X), min(X), 10)
digitized = np.digitize(X, bins)
bin_centers = []
for j in range(len(bins) - 1):
bin_centers.append((bins[j] + bins[j + 1]) / 2.)
bin_means = [np.median(np.asarray(Y)[digitized == j])
for j in range(1, len(bins))]
The problem now is that sometimes a bin is empty since there is no X-value in this bin. In this case the line
bin_means = [np.median(np.asarray(Y)[digitized == j])
for j in range(1, len(bins))]
raises the error
/usr/lib64/python2.6/site-packages/numpy/core/_methods.py:55: RuntimeWarning: Mean of empty slice.
FloatingPointError: invalid value encountered in double_scalars
because of the empty bin. How can I fix that? I also tried right=True/False
in numpy.digitize
with no luck. I think it would be best to delete the entries in the three lists bin_centers
, in digitized
, and bins
before doing this list comprehension that calculates the median values. But I'm not sure how to do that, how to find out which bins are empty and then what has to be deleted from those lists and how.
Any ideas? Thanks!