1

I am new with vectorization and generators. So far I have created the following function:

import numpy as np

def ismember(a,b):
    for i in a:
        if len(np.where(b==i)[0]) == 0:
            lv_var = 0
        else:
            lv_var = np.int(np.where(b==i)[0])
        yield lv_var

vect = np.vectorize(ismember)

A = np.array(xrange(700000))
B = np.array(xrange(700000))

lv_result = vect(A,B)

When I try to cast lv_result as a list or loop through the resulting numpy array I get a list of generator objects. I need to somehow get the actual result. How do I print the actual results from this function ? .next() on generator doesn't seem to do the job.

Could someone tell me what is that I am doing wrong or how could I reconfigure the code to achieve the end goal?

---------------------------------------------------

OK so I understand the vectorize part now (thanks Viet Nguyen for the example).
I was also able to print the generator object results. The code has been modified. Please see below.

For the generator part:

What I am trying to do is to mimic a MATLAB function called ismember (The one that is formatted as: [Lia,Locb] = ismember(A,B). I am just trying to get the Locb part only.

From Matlab: Locb, contain the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B

One of the main problems is that I need to be able to perform this operation as efficient as possible. For testing I have two arrays of 700k elements. Creating a generator and going through the values of the generator doesn't seem to get any better performance wise.

To print the Generator, I have created function f() below.

import numpy as np

def ismember(a,b):
    for i in a:
        index = np.where(b==i)[0]
        if len(index) == 0:
            yield 0
        else:
            yield index


def f(A, gen_obj):
    my_array = np.arange(len(A))
    for i in my_array:
        my_array[i] = gen_obj.next()
    return my_array


A = np.arange(700000)
B = np.arange(700000)

gen_obj = ismember(A,B)

f(A, gen_obj)

print 'done'

Note: if we were to try the above code with smaller arrays: Lets say.

A = np.array([3,4,4,3,6])

B = np.array([2,5,2,6,3])

The result will be an array of : [4 0 0 4 3]

Just like matlabs function: the goal is to get the lowest index in B for each value in A that is a member of B. The output array, Locb, contains 0 wherever A is not a member of B.

Numpy's intersection function doesn't help me to achieve the goal. Also the size of the returning array needs to be kept the same size as the size of array A.

So far this process is taking forever(for arrays of 700k elements). Unfortunately I haven't been able to find the best solution yet. Any inputs on how could I reconfigure the code to achieve the end goal, with the best performance, will be much appreciated.


Optimization Problem solved in:

python-run-generator-using-multiple-cores-for-optimization

Community
  • 1
  • 1
zd5151
  • 395
  • 1
  • 3
  • 15
  • Can you explain what exactly you're trying to do? For example, if A = [0, 1, 2] and B = [0, 1, 2], then what do you expect lv_result to be? – user650654 Apr 07 '13 at 03:28
  • 2
    Why are you making your function a generator at all? The way `vectorize` works is you write a function that accepts a single value from the input vector (or several single values, one from each input vector) and then `vectorize` applies it elementwise to the argument vectors. You seem to be trying to write a function that makes use of both input vectors in their entirety, which means you can't vectorize it. – BrenBarn Apr 07 '13 at 03:32
  • What is the purpose of `np.int(np.where(b==i)[0])`? It should raise and exception since `int()` argument cannot be a tuple. – A. Rodas Apr 07 '13 at 03:42
  • More information has been added. – zd5151 Apr 07 '13 at 13:26
  • Optimization Problem solved in: [python-run-generator-using-multiple-cores-for-optimization][1] [1]: http://stackoverflow.com/questions/15864082/python-run-generator-using-multiple-cores-for-optimization – zd5151 Apr 07 '13 at 16:23

1 Answers1

2

I believe you've misunderstood the inputs to a numpy.vectorize function. The "vectorized" function operates on the arrays on a per-element basis (see numpy.vectorize reference). Your function ismember seems to presume that the inputs a and b are arrays. Instead, think of the function as something you would use with built-in map().

> import numpy as np
> def mask(a, b):
>   return 1 if a == b else 0
> a = np.array([1, 2, 3, 4])
> b = np.array([1, 3, 4, 5])
> maskv = np.vectorize(mask)
> maskv(a, b)
  array([1, 0, 0, 0])

Also, if I'm understanding your intention correctly, NumPy comes with an intersection function.

  • And it also comes with `np.arange` (instead of `np.array(xrange(...))`). – jorgeca Apr 07 '13 at 09:27
  • The `mask()` function is not such a useful example: `a == b` (or `(a==b).astype(int)`) directly gives the same result as the more complicated function + vectorize form. – Eric O. Lebigot Apr 07 '13 at 14:25
  • @z5151: The code is in my previous comment: it works if `a` and `b` are *arrays*—like in your main code. – Eric O. Lebigot Apr 08 '13 at 01:28
  • @EOL Problem solved. See: http://stackoverflow.com/questions/15864082/python-run-generator-using-multiple-cores-for-optimization – zd5151 Apr 08 '13 at 02:32