numpy.digitize / numpy.searchsorted: recycle indices without IndexError

Question

I have vectorized an array values of test values, whose locations I would like to find in an array of bins. numpy allows for this using np.digitize or np.searchsorted:

import numpy as np
values=np.arange(-5.1,20.9,0.5)
bins=np.arange(10)

indices=np.digitize(values,bins)
indices2=np.searchsorted(bins,values)
assert((indices==indices2).all()), 'NB np.digitize and np.searchsorted argument orders are reversed'

It seems obvious, but I now wish to use my nice shiny vector array indices, but the following throws an IndexError because np.digitize and np.searchsorted (helpfully) set out-of-bounds inputs to 0 or len(bins):

print bins[indices] # -> IndexError

Since the whole point was to keep everything vectorized, what is the best way to trap the IndexError without a for-loop and hence assign the output of bins[indices] to another array for onward calculation? (A clip value is available on request :) )

Should I use a custom mapping function or lambda? (along the lines of How to make scipy.interpolate give an extrapolated result beyond the input range? - except I don't want to interpolate an exact array)

Thanks as ever for all help

Update:

I could actually - in my case - achieve the desired onward behaviour with:

indices[indices==len(bins)]=len(bins)-1
print bins[indices]

But this does lose some information* and the most general, best-practice route - which is what I am after - is not clear to me.

*That's because I would no longer be able to distinguish between acceptable indices and those that were out-of-bounds.

What about producing `bins` as length `n_bins` and use `indices=np.digitize(values, bins[:n_bins-1])` - in this case you have an extra bin for your outliers? — rammelmueller, Oct 12 '17 at 11:21
Could you list out how would you lose out information with the Update section code? — Divakar, Oct 12 '17 at 12:32

numpy.digitize / numpy.searchsorted: recycle indices without IndexError

0 Answers0