9

In numpy if I want to compare two arrays, say for example I want to test if all elements in A are less than values in B, I use if (A < B).all():. But in practice this requires allocation and evaluation of complete array C = A < B and then calling C.all() on it. This is a bit of waste. Is there any way to 'shortcut' the comparison, i.e. directly evaluate A < B element by element (without allocation and calculation of temporary C) and stop and return False when first invalid element comparison is found?

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • 1
    Does this 'waste' noticeably impact performance? – mfitzp Jan 18 '16 at 10:29
  • In my case not much. But in general it could have impact - I bet that allocating extra 1000x1000x1000 array could harm your performance. I was asking out of mere curiosity, I believe this shortcut evaluation would be very handy. There are function `allclose()` which I believe stop when first noncompliant element is found. So I was surprised that I did not find `allequal(a,b)`, `allless(a,b)` and `allgreater(a,b)` which all would be very popular in their use I believe, because `if (A – HiFile.app - best file manager Jan 18 '16 at 10:36
  • Having looked at the numpy source while I don't think there is support for shortcutting the A – mfitzp Jan 18 '16 at 10:39
  • There is `np.array_equal` (see here: http://stackoverflow.com/questions/10580676/comparing-two-numpy-arrays-for-equality-element-wise) but no corresponding less. However, that function does not shortcut either (it's a wrapper around `(A==B).all()` with additional checks). – mfitzp Jan 18 '16 at 10:40
  • http://docs.scipy.org/doc/numpy-1.10.1/user/c-info.how-to-extend.html go ahead ;-) – Dima Tisnek Jan 18 '16 at 10:40
  • As I have just found by looking at the `numpy` source, we have `allequal(a,b)` function (to partly correct my previous comment) but it does not shortcut the evaluation, which is pity. – HiFile.app - best file manager Jan 18 '16 at 11:12
  • You may be able to wring your own function using cython \ numba for better performance. But you'll probably have to make them type specific for the speed gain. So you would be sacrificing flexibility for speed. – M4rtini Jan 18 '16 at 11:52
  • So you are asking for compiled functions that implement `allless`, `anyless`, `allless_or_equal`, `any_not_equal`, `any_gt`, `all_ge`, etc? This is a big step away from the building-block model that makes `numpy` so convenient (and fast enough for general problems). – hpaulj Jan 18 '16 at 17:36
  • @hpaulj: I do not think that we need all the combinations, only a few of them would do. Consider if we had `all_lt(a,b)`. Then we would have for free the following three `all_gt`, `any_le`, `any_ge` because of the equivalences: `not all_lt(a,b) <==> any_ge(a,b)` and `all_lt(b, a) <==> all_gt(a,b)` and `not all_lt(b,a) <==> any_le(a,b)`. If then we add `any_lt(a,b)` we would get another three for free. And then we would need only `allequal(a,b)` which is already in `numpy` but is not using shortcut evaluation. So only two more functions (e.g. `all_lt` and `any_lt`) would cover all our needs. – HiFile.app - best file manager Jan 19 '16 at 17:05
  • Come on people, trying to persuade one into statement "well, I do not actually need it" is too cheap trick. I do not need my car either, of course, I have two legs and can walk. :) – HiFile.app - best file manager Jan 19 '16 at 17:16
  • What about `a – hpaulj Jan 19 '16 at 17:19

2 Answers2

1

How large are you arrays? I would imagine they are very large, e.g. A.shape = (1000000) or larger before performance becomes an issue. Would you consider using numpy views?

Instead of comparing (A < B).all() or (A < B).any() you can try defining a view, such as (A[:10] < B[:10]).all(). Here's a simple loop that might work:

k = 0
while( (A[k*10: (k+1)*10] < B[k*10: (k+1)*10] ).all() ):
    k += 1

Instead of 10 you can use 100 or 10**3 segment size you wish. Obviously if your segment size is 1, you are saying:

k = 0
while ( A[k] < B[k] ):
    k+= 1

Sometimes, comparing the entire array can become memory intensive. If A and B have length of 10000 and I need to compare each pair of elements, I am going to run out of space.

john mangual
  • 7,718
  • 13
  • 56
  • 95
1

Plain Python and and or use shortcut evaluation, but numpy does not.

(A < B).all()

uses numpy building blocks, the broadcasting, the element by element comparison with < and the all reduction. The < works just other binary operations, plus, times, and, or, gt, le, etc. And all is like other reduction methods, any, max, sum, mean, and can operate on the whole array or by rows or by columns.

It is possible to write a function that combines the all and < into one iteration, but it would be difficult to get the generality that I just described.

But if you must implement an iterative solution, with a shortcut action, and do it fast, I'd suggest developing the idea with nditer, and then compile it with cython.

http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html is a good tutorial on using nditer, and it takes you through using it in cython. nditer takes care of broadcasting and iteration, letting you concentrate on the comparison and any shortcutting.

Here's a sketch of an iterator that could be cast into cython:

import numpy as np

a = np.arange(4)[:,None]
b = np.arange(2,5)[None,:]
c = np.array(True)
it = np.nditer([a, b, c], flags=['reduce_ok'],
    op_flags = [['readonly'], ['readonly'],['readwrite']])
for x, y, z in it:
    z[...] = x<y
    if not z:
        print('>',x,y)
        break
    else:
        print(x,y)
print(z)

with a sample run:

1420:~/mypy$ python stack34852272.py 
(array(0), array(2))
(array(0), array(3))
(array(0), array(4))
(array(1), array(2))
(array(1), array(3))
(array(1), array(4))
('>', array(2), array(2))
False

Start with a default False, and a different break condition and you get a shortcutting any. Generalizing the test to handle <, <=, etc will be more work.

Get something like this working in Python, and then try it in Cython. If you have trouble with that step, come back with a new question. SO has a good base of Cython users.

hpaulj
  • 221,503
  • 14
  • 230
  • 353