3

It seems I still struggle with the "in" operator in numpy. Here's the situation:

>>> a = np.random.randint(1, 10, (2, 2, 3))
>>> a
array([[[9, 8, 8],
        [4, 9, 1]],

       [[6, 6, 3],
        [9, 3, 5]]])

I would like to get the indexes of those triplets whose second element is in (6, 8). The way I intuitively tried is:

>>> a[:, :, 1] in (6, 8)
ValueError: The truth value of an array with more than one element...

My ultimate goal would be to insert at those positions the the number preceding those multiplied by two. Using the example above, a should become:

array([[[9, 18, 8],   #8 @ pos #2 --> replaced by 9 @ pos #1 by 2
        [4, 9, 1]],

       [[6, 12, 3],   #6 @ pos #2 --> replaced by 6 @ pos #1 by 2
        [9, 3, 5]]])

Thank you in advance for your advice and time!

Community
  • 1
  • 1
mac
  • 42,153
  • 26
  • 121
  • 131

4 Answers4

2

Here's a method that will work for an arbitrary length tuple. It uses the numpy.in1d function.

import numpy as np
np.random.seed(1)

a = np.random.randint(1, 10, (2, 2, 3))
print(a)

check_tuple = (6, 9, 1)

bool_array = np.in1d(a[:,:,1], check_tuple)
ind = np.where(bool_array)[0]
a0 = a[:,:,0].reshape((len(bool_array), ))
a1 = a[:,:,1].reshape((len(bool_array), ))
a1[ind] = a0[ind] * 2

print(a)

And the output:

[[[6 9 6]
  [1 1 2]]

 [[8 7 3]
  [5 6 3]]]

[[[ 6 12  6]
  [ 1  2  2]]

 [[ 8  7  3]
  [ 5 10  3]]]
joshayers
  • 3,269
  • 4
  • 23
  • 19
  • Interesting to discover the use of `in1d`. It's a bit verbose (in terms of number of operation transformed) but worth experimenting with it! +1 – mac Nov 06 '11 at 01:02
1

There is another method based on using a lookup table which I learned from one of the developers of Cellprofiler. First you need to create a lookup-table (LUT) which has the size of the largest number in your array. For each possible array value, the LUT has either a True or a false value. Example:

# create a large volume image with random numbers
a = np.random.randint(1, 1000, (50, 1000 , 1000))
labels_to_find=np.unique(np.random.randint(1,1000,500))

# create filter mask LUT 
def find_mask_LUT(inputarr, obs):
    keep = np.zeros(np.max(inputarr)+1, bool)
    keep[np.array(obs)] = True
    return keep[inputarr]

# This will return a mask that is the 
# same shape as a, with True is a is one of the 
# labels we look for, False otherwise
find_mask_LUT(a, labels_to_find)

This works really fast (much faster than np.in1d, and the speed does not depend on the number of objects.)

VolkerH
  • 11
  • 1
1
import numpy as np
a = np.array([[[9, 8, 8],
               [4, 9, 1]],

              [[6, 6, 3],
               [9, 3, 5]]])

ind=(a[:,:,1]<=8) & (a[:,:,1]>=6)
a[ind,1]=a[ind,0]*2
print(a)

yields

[[[ 9 18  8]
  [ 4  9  1]]

 [[ 6 12  3]
  [ 9  3  5]]]

If you wish to check for membership in a set which is not a simple range, then I like both mac's idea of using a Python loop and bellamyj's idea of using np.in1d. Which is faster depends on the size of check_tuple:

test.py:

import numpy as np
np.random.seed(1)

N = 10
a = np.random.randint(1, 1000, (2, 2, 3))
check_tuple = np.random.randint(1, 1000, N)

def using_in1d(a):
    idx = np.in1d(a[:,:,1], check_tuple)
    idx=idx.reshape(a[:,:,1].shape)
    a[idx,1] = a[idx,0] * 2
    return a

def using_in(a):
    idx = np.zeros(a[:,:,0].shape,dtype=bool)
    for n in check_tuple:
        idx |= a[:,:,1]==n
    a[idx,1] = a[idx,0]*2
    return a

assert np.allclose(using_in1d(a),using_in(a))    

When N = 10, using_in is slightly faster:

% python -m timeit -s'import test' 'test.using_in1d(test.a)'
10000 loops, best of 3: 156 usec per loop
% python -m timeit -s'import test' 'test.using_in(test.a)'
10000 loops, best of 3: 143 usec per loop

When N = 100, using_in1d is much faster:

% python -m timeit -s'import test' 'test.using_in1d(test.a)'
10000 loops, best of 3: 171 usec per loop
% python -m timeit -s'import test' 'test.using_in(test.a)'
1000 loops, best of 3: 1.15 msec per loop
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Is there a badge for the Official numpy-tutoring? If it were you would have got it. :) Do you have any pointer to some good study material on numpy? The official [tentative] tutorial and the reference material is structured in a way for which I find it difficult to find the info I need... :-/ – mac Nov 06 '11 at 00:45
  • BTW: this is a neat workaround, but if it were possible to use the `in` operator would have preferred, as in my "real case" I have a pool of roughly 10 values, non only `(6, 8)`. – mac Nov 06 '11 at 00:48
  • The trick is not in what you read, but how you read it. For each function, spend a few minutes asking, what is a simple example that demonstrates how this function behaves? When might I use this? Start building an examples file that exercises/demonstrates each function/concept. Learning comes as a side-effect of building the examples file. The tentative tutorial and official docs are great resources. You might also try holing up some weekend with the [numpy book](http://www.tramy.us/) and [user guide](http://docs.scipy.org/doc/numpy/numpy-user.pdf). Good luck! – unutbu Nov 06 '11 at 07:11
0

Inspired by unutbu's answer I found out this possible solution:

>>> l = (8, 6)
>>> idx = np.zeros((2, 2), dtype=bool)
>>> for n in l:
...     idx |= a[:,:,1] == n
>>> idx
array([[ True, False],
       [ True, False]], dtype=bool)
>>> a[idx]
array([[9, 8, 8],
       [6, 6, 3]])

It requires to know the dimensions of the array to investigate beforehand, though.

Community
  • 1
  • 1
mac
  • 42,153
  • 26
  • 121
  • 131