10

I have a 3D-array consisting of several numbers within each band. Is there a function that returns the index positions where the array meets MULTIPLE conditions?

I tried the following:

index_pos = numpy.where(
    array[:,:,0]==10 and array[:,:,1]==15 and array[:,:,2]==30)

It returns the error:

ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
ely
  • 74,674
  • 34
  • 147
  • 228
MoTSCHIGGE
  • 949
  • 5
  • 12
  • 21

3 Answers3

12

You actually have a special case where it would be simpler and more efficient to do the following:

Create the data:

>>> arr
array([[[ 6,  9,  4],
        [ 5,  2,  1],
        [10, 15, 30]],

       [[ 9,  0,  1],
        [ 4,  6,  4],
        [ 8,  3,  9]],

       [[ 6,  7,  4],
        [ 0,  1,  6],
        [ 4,  0,  1]]])

The expected value:

>>> index_pos = np.where((arr[:,:,0]==10) & (arr[:,:,1]==15) & (arr[:,:,2]==30))
>>> index_pos
(array([0]), array([2]))

Use broadcasting to do this simultaneously:

>>> arr == np.array([10,15,30])
array([[[False, False, False],
        [False, False, False],
        [ True,  True,  True]],

       [[False, False, False],
        [False, False, False],
        [False, False, False]],

       [[False, False, False],
        [False, False, False],
        [False, False, False]]], dtype=bool)

>>> np.where( np.all(arr == np.array([10,15,30]), axis=-1) )
(array([0]), array([2]))

If the indices you want are not contiguous you can do something like this:

ind_vals = np.array([0,2])
where_mask = (arr[:,:,ind_vals] == values)

Broadcast when you can.

Spurred by @Jamie's comment, some interesting things to consider:

arr = np.random.randint(0,100,(5000,5000,3))

%timeit np.all(arr == np.array([10,15,30]), axis=-1)
1 loops, best of 3: 614 ms per loop

%timeit ((arr[:,:,0]==10) & (arr[:,:,1]==15) & (arr[:,:,2]==30))
1 loops, best of 3: 217 ms per loop

%timeit tmp = (arr == np.array([10,15,30])); (tmp[:,:,0] & tmp[:,:,1] & tmp[:,:,2])
1 loops, best of 3: 368 ms per loop

The question becomes, why is this?:

First off examine:

%timeit (arr[:,:,0]==10)
10 loops, best of 3: 51.2 ms per loop

%timeit (arr == np.array([10,15,30]))
1 loops, best of 3: 300 ms per loop

One would expect that arr == np.array([10,15,30]) would be at worse case 1/3 the speed of arr[:,:,0]==10. Anyone have an idea why this is not the case?

Then when combining the final axis there are many ways to accomplish this.

tmp = (arr == np.array([10,15,30]))

method1 = np.all(tmp,axis=-1)
method2 = (tmp[:,:,0] & tmp[:,:,1] & tmp[:,:,2])
method3 = np.einsum('ij,ij,ij->ij',tmp[:,:,0] , tmp[:,:,1] , tmp[:,:,2])

np.allclose(method1,method2)
True
np.allclose(method1,method3)
True

%timeit np.all(tmp,axis=-1)
1 loops, best of 3: 318 ms per loop

%timeit (tmp[:,:,0] & tmp[:,:,1] & tmp[:,:,2])
10 loops, best of 3: 68.2 ms per loop

%timeit np.einsum('ij,ij,ij->ij',tmp[:,:,0] , tmp[:,:,1] , tmp[:,:,2])
10 loops, best of 3: 38 ms per loop

The einsum speed up is well defined elsewhere, but it seems odd to me that there is such a difference between all and consecutive &'s.

Community
  • 1
  • 1
Daniel
  • 19,179
  • 7
  • 60
  • 74
  • 1
    With very large arrays it may be slower to create your single boolean array, rather than three of one third the size, but it definitely looks much cleaner. You can, by the way, let the ufunc do the conversion of the list to an array: `arr = [10, 15, 30]` works just the same. – Jaime Nov 04 '13 at 16:16
  • @Jamie Please view my edits, your comment is very interesting and I do not quite understand why it is so much slower. The `np.array(...)` comes from habit- there are a few places where the ufunc does not call `np.asarray` and it makes me nervous. – Daniel Nov 04 '13 at 17:38
  • 1
    I think the speed in building the boolean arrays is a memory cache thing. The 3 to 1 speed ratio you mention only holds for tiny arrays like `(10, 10, 3)`, I'm guessing that here they both fit in cache. It goes up to 10 to 1 for intermediate arrays like `(100, 100, 3)`, probably because one fits in cache and the other doesn't. And apparently starts going down again for larger arrays like you `(5000, 5000, 3)`, probably since both now require out of cache reads. – Jaime Nov 04 '13 at 18:29
  • As for the very poor performance of `np.all`, I don't really have an answer, but I can confirm it. If I have tracked the source code right, the implementation is [here](https://github.com/numpy/numpy/blob/v1.8.0/numpy/core/_methods.py#L35) so it ends up being a call to `np.logical_and.reduce`. This requires an extra and'ing (of a `True` with the first item) but that hardly justifies it running 5x times slower. A pity that the implementation doesn't break out early of loops, seems like a very straightforward thing to do in C that could speed things up ridiculously. – Jaime Nov 04 '13 at 18:33
  • As per askewchan's [comment below](http://stackoverflow.com/questions/19770361/find-index-positions-where-3d-array-meets-multiple-conditions/19770458#comment29387038_19770483) all of the timing for the bitwise and is irrelevant. It is not logically the operation that the OP wants to perform, and *really* shouldn't be used for this purpose. – ely Nov 04 '13 at 18:57
  • @Jamie I am not sure I understand your caching argument for large arrays. By that statement you are essentially stating that broadcasting is slower then individual operations if it does not fit into cache memory. – Daniel Nov 04 '13 at 19:01
  • @EMS The OP is showing anding boolean arrays in which bitwise operations is perfectly fine. – Daniel Nov 04 '13 at 19:02
  • That's just coincidental. The goal is to combine conditions that evaluate to `True` across arrays with a logical and operation. See the comment I linked for an example where the bitwise approach would fail but to which the OP's question could easily apply. Reading the OP's question too literally (as in, just because `array > 5` (for example) will be a Boolean array in NumPy) is unhelpful. – ely Nov 04 '13 at 19:16
  • 1
    @EMS I agree that it is a comment that is worth making, and a good point (it is why I up voted the comment). However, saying that the bitwise portion is irrelevant strikes me as a bit heavy handed. Especially when bitwise operations are explicitly useful in the example provided. – Daniel Nov 04 '13 at 19:25
  • 1
    I guess I see it as the reverse of your comment. The bitwise approach would be fine as a one-off comment that's like "hey, you can also do it with this bitwise symbol, but be careful of some pitfalls." The bitwise approach is misleading here if it's treated as the main or preferred answer. The bitwise approach is useful, but not the right tool for the job. – ely Nov 04 '13 at 19:36
6

The and operator won't work in this case.

index_pos = numpy.where(array[:,:,0]==10 and array[:,:,1]==15 and array[:,:,2]==30)

Give this a try:

index_pos = numpy.where((array[:,:,0]==10) & (array[:,:,1]==15) & (array[:,:,2]==30))
  • 4
    Note that using `&` is equivalent to using `np.bitwise_and` and it will do what you expect for boolean arrays: an elementwise *and*, as in the example, because bools are single bits. However, it still is *bitwise*, so `np.array([1,2]) & np.array([0,1])` won't give the same as `np.logical_and(np.array([1,2]), np.array([0,1]))`. – askewchan Nov 04 '13 at 16:32
4

The problem is the use of the native Python and keyword, which doesn't behave the way you'd like on arrays.

Instead, try using the numpy.logical_and function.

cond1 = np.logical_and(array[:,:,0]==10, array[:,:,1]==15)
cond2 = np.logical_and(cond1, array[:,:,2]==30)
index_pos = numpy.where(cond2)

You might even create your own version of logical_and that accepts arbitrary number of conditions:

def my_logical_and(*args):
    return reduce(np.logical_and, args)

condition_locs_and_vals = [(0, 10), (1, 15), (2, 30)]
conditions = [array[:,:,x] == y for x,y in conditition_locs_and_vals]
my_logical_and(*conditions)

Using bitwise-and (&) works but only by coincidence. The bitwise-and is for comparing bits or bool types. Using it to compare the truth value of numeric arrays is not robust (for instance, if you suddenly need to index on locations where an entry evaluates to True rather than actually first converting to a bool array). logical_and really should be used instead of & (even if it comes with a speed penalty).

Also, chaining together arbitrary lists of conditions with & can be painful both to read and type. And for re-usability of the code, so that later programmers don't have to change around a bunch of the subordinate clauses to the & operator, it might be better to store the individual conditions separately, and then use a function like the one above to combine them.

ely
  • 74,674
  • 34
  • 147
  • 228
  • 2
    Nice answer. Incidentally, `np.logical_and` has a `reduce` method, which is identical to your `my_logical_and` as far as I can tell: `np.logical_and.reduce(*conditions)`. You could also do this with `np.all(conditions, axis=0)`. – askewchan Nov 04 '13 at 21:21