EDIT I have kept the more complicated problem I am facing below, but my problems with np.take
can be summarized better as follows. Say you have an array img
of shape (planes, rows)
, and another array lut
of shape (planes, 256)
, and you want to use them to create a new array out
of shape (planes, rows)
, where out[p,j] = lut[p, img[p, j]]
. This can be achieved with fancy indexing as follows:
In [4]: %timeit lut[np.arange(planes).reshape(-1, 1), img]
1000 loops, best of 3: 471 us per loop
But if, instead of fancy indexing, you use take and a python loop over the planes
things can be sped up tremendously:
In [6]: %timeit for _ in (lut[j].take(img[j]) for j in xrange(planes)) : pass
10000 loops, best of 3: 59 us per loop
Can lut
and img
be in someway rearranged, so as to have the whole operation happen without python loops, but using numpy.take
(or an alternative method) instead of conventional fancy indexing to keep the speed advantage?
ORIGINAL QUESTION
I have a set of look-up tables (LUTs) that I want to use on an image. The array holding the LUTs is of shape (planes, 256, n)
, and the image has shape (planes, rows, cols)
. Both are of dtype = 'uint8'
, matching the 256
axis of the LUT. The idea is to run the p
-th plane of the image through each of the n
LUTs from the p
-th plane of the LUT.
If my lut
and img
are the following:
planes, rows, cols, n = 3, 4000, 4000, 4
lut = np.random.randint(-2**31, 2**31 - 1,
size=(planes * 256 * n // 4,)).view('uint8')
lut = lut.reshape(planes, 256, n)
img = np.random.randint(-2**31, 2**31 - 1,
size=(planes * rows * cols // 4,)).view('uint8')
img = img.reshape(planes, rows, cols)
I can achieve what I am after using fancy indexing like this
out = lut[np.arange(planes).reshape(-1, 1, 1), img]
which gives me an array of shape (planes, rows, cols, n)
, where out[i, :, :, j]
holds the i
-th plane of img
run through the j
-th LUT of the i
-th plane of the LUT...
All is good, except for this:
In [2]: %timeit lut[np.arange(planes).reshape(-1, 1, 1), img]
1 loops, best of 3: 5.65 s per loop
which is completely unacceptable, especially since I have all of the following not so nice looking alternatives using np.take
than run much faster:
A single LUT on a single plane runs about x70 faster:
In [2]: %timeit np.take(lut[0, :, 0], img[0]) 10 loops, best of 3: 78.5 ms per loop
A python loop running through all the desired combinations finishes almost x6 faster:
In [2]: %timeit for _ in (np.take(lut[j, :, k], img[j]) for j in xrange(planes) for k in xrange(n)) : pass 1 loops, best of 3: 947 ms per loop
Even running all combinations of planes in the LUT and image and then discarding the
planes**2 - planes
unwanted ones is faster than fancy indexing:In [2]: %timeit np.take(lut, img, axis=1)[np.arange(planes), np.arange(planes)] 1 loops, best of 3: 3.79 s per loop
And the fastest combination I have been able to come up with has a python loop iterating over the planes and finishes x13 faster:
In [2]: %timeit for _ in (np.take(lut[j], img[j], axis=0) for j in xrange(planes)) : pass 1 loops, best of 3: 434 ms per loop
The question of course is if there is no way of doing this with np.take
without any python loop? Ideally whatever reshaping or resizing is needed should happen on the LUT, not the image, but I am open to whatever you people can come up with...