Fastest way to compare Numpy ndarrays

Question

I have stored 4 ndarrays in a dictionary dictPrices and would like to generate another boolean ndarray for each of the 2 cases: (1) element-wise, if number in any of the 4 ndarrays exceed x (2) element-wise, if number in all of the 4 ndarrays exceed x

dictPrices[1] >= x works but when i tried (dictPrices[1] >= x | dictPrices[2] >= x), it fails. (dictPrices[1] >= x or dictPrices[2] >= x) failed too.

As the ndarrays can be huge (from monte carlo), I was hoping for vectorization rather than to loop through each ndarray element-wise.

Thank you!

Are you sure that all 4 arrays the same shape? In what way does the example you tried fail? — wim, Dec 14 '16 at 02:55
hi wim, yup they are of shape (7, 250000) as i was simulating 4 different price sets. error thrown was `ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`. `print((dictPrices[1]>=x or dictPrices[2]>=x).any())` does not work either — AiRiFiEd, Dec 14 '16 at 03:06

score 2 · Accepted Answer · edited May 23 '17 at 12:13

2

I think you want this:

np.logical_or.reduce([prices >= x for prices in dictPrices.values()])

This is explained in some detail here: Numpy `logical_or` for more than two arguments

And of course for the second case you can use logical_and instead of logical_or.

edited May 23 '17 at 12:13

Community

1
1

answered Dec 14 '16 at 02:56

John Zwinck

239,568
38
324
436

hey John, thanks for pointing me in the right direction! The code works and it seems extremely efficient...with run time increasing only by 0.006s to loop through 4 sets of (7, 250000). I am still trying to figure out from the link how np.logical_or.reduce works for multidimensional arrays... also, it seems from your answer that there is actually a for loop going in there. but from my limited experience, when i try to loop using the normal for loop, it usually is very expensive - can I just ask why is this method so fast to execute (I understand this as "functional programming")? Thank you! – AiRiFiEd Dec 14 '16 at 03:17
@AiRiFiEd: Well there is a "for" loop in my code only over the 4 elements of `dictPrices`. A for loop over 4 items is not slow at all--what's slow is if you iterate over thousands or millions of rows. If you want to eliminate the for loop completely you can rework your data structure to be a single 2D array with the new dimension being 4. But copying your data into that won't be worth it if the only reason is to avoid one or two loops. – John Zwinck Dec 14 '16 at 04:36
Thanks for the explanation! As I was reading a book on python recently on "functional programming", i tried to change your solution a little by trying to use the `map` function - `np.logical_and.reduce( map(lambda prices:prices>=x, dictPrices.values()))` - currently its returning a memory address (``) but by any chance do you know if this would work and would you expect an improvement in performance? Thanks so much for your help with this!! – AiRiFiEd Dec 14 '16 at 05:48
@AiRiFiEd: Don't bother. `reduce()` expects a concrete sequence, not a generator like `map()` gives you. You won't gain anything by heading down this path. But if you insist, you can do `reduce(list(map(...)))`. – John Zwinck Dec 14 '16 at 07:27
apologies for the late reply. You are right - i tried both methods and runtimes were about the same - definitely not work the method. thanks again for your guidance on these matters! – AiRiFiEd Dec 14 '16 at 14:52

Fastest way to compare Numpy ndarrays

1 Answers1