I am trying to figure out which is the best way to parallelize the execution of a single operation for each cell in a 2D numpy array.
In particular, I need to do a bitwise operation for each cell in the array.
This is what I do using a single for
cycle:
for x in range(M):
for y in range(N):
v[x][y] = (v[x][y] >> 7) & 255
I found a way to do the same above using the vectorize
method:
def f(x):
return (x >> 7) & 255
f = numpy.vectorize(f)
v = f(v)
However, using vectorize doesn't seem to improve performance.
I read about numexpr in this answer on StackOverflow, where also Theano and Cython are cited. Theano in particular seems a good solution, but I cannot find examples that fit my case.
So my question is: which is the best way to improve the above code, using parallelization and possibly GPU computation? May someone post some sample code to do this?