Is 2-dimensional numpy.take fast?

Question

numpy.take can be applied in 2 dimensions with

np.take(np.take(T,ix,axis=0), iy,axis=1 )

I tested the stencil of the discret 2-dimensional Laplacian

ΔT = T[ix-1,iy] + T[ix+1, iy] + T[ix,iy-1] + T[ix,iy+1] - 4 * T[ix,iy]

with 2 take-schemes and the usual numpy.array scheme. The functions p and q are introduced for a leaner code writing and adress the axis 0 and 1 in different order. This is the code:

nx = 300; ny= 300
T  = np.arange(nx*ny).reshape(nx, ny)
ix = np.linspace(1,nx-2,nx-2,dtype=int) 
iy = np.linspace(1,ny-2,ny-2,dtype=int)
#------------------------------------------------------------
def p(Φ,kx,ky):
    return np.take(np.take(Φ,ky,axis=1), kx,axis=0 )
#------------------------------------------------------------
def q(Φ,kx,ky):
    return np.take(np.take(Φ,kx,axis=0), ky,axis=1 )
#------------------------------------------------------------
%timeit ΔT_n = T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] + T[1:nx-1,0:ny-2]  + T[1:nx-1,2:ny] - 4.0 * T[1:nx-1,1:ny-1] 
%timeit ΔT_t = p(T,ix-1,iy)  + p(T,ix+1,iy)  + p(T,ix,iy-1)  + p(T,ix,iy+1)  - 4.0 * p(T,ix,iy)
%timeit ΔT_t = q(T,ix-1,iy)  + q(T,ix+1,iy)  + q(T,ix,iy-1)  + q(T,ix,iy+1)  - 4.0 * q(T,ix,iy)
.
1000 loops, best of 3: 944 µs per loop
100 loops, best of 3: 3.11 ms per loop
100 loops, best of 3: 2.02 ms per loop

The results seem to be obvious:

usual numpy index arithmeitk is fastest
take-scheme q takes 100% longer (= C-ordering ?)
take-scheme p takes 200% longer (= Fortran-ordering ?)

Not even the 1-dimensional example of the scipy manual indicates that numpy.take is fast:

a = np.array([4, 3, 5, 7, 6, 8])
indices = [0, 1, 4]
%timeit np.take(a, indices)
%timeit a[indices]
.
The slowest run took 6.58 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.32 µs per loop
The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.87 µs per loop

Does anybody has experiences how to make numpy.take fast ? It would be an flexible and attractive way for lean code writing that is fast in coding and
is told to be fast in execution as well. Thank your for some hints to improve my approach !

My memory from past tests is the `np.take` is a little faster than the indexing notation. But the advantage is small enough that wrapping it in a function call as you do could destroy it. https://stackoverflow.com/questions/44487889/why-is-np-compress-faster-than-boolean-indexing — hpaulj, Jul 24 '17 at 21:15
@Divakar: yes, I tried `np.ix_` too (but omited it for shortness in my question): In my tests `np.ix_` was slower than the better np.take — pyano, Jul 24 '17 at 21:19
Would `ix` and `iy` always follow such a pattern of constant stepsize within their indices? — Divakar, Jul 24 '17 at 21:22
@hpailj: you are right: I should try without the function-wrapping too. But I would like to write a rather complex CFD (computation fluid dynamics) code. So lean writing is essential, resp. un-lean code is very error prone. — pyano, Jul 24 '17 at 21:26
`T[0:nx-2,1:ny-1]` is a view (basic indexing with 2 slices). `p(T,ix-1,iy)` is a copy, indexing with 2 arrays. The sums will, of course, be a copy. `take` could be faster than direct 'fancy' indexing, `T[ix-1,:][:,iy]` — hpaulj, Jul 24 '17 at 21:27
@Divakar: I you might try `np.ix_` by introducing the following in my code above: ```from numpy import ix_ as q; ΔT_ix_ = T[q(ix-1,iy)] + T[q(ix+1,iy)] + T[q(ix,iy-1)] + T[q(ix,iy+1)] - 4.0 * T[q(ix,iy)]``` — pyano, Jul 24 '17 at 21:33
@hpaulj: direct fancy indexing does not work in 2 dimensions - I tryed. So `T[ix-1,iy]` does not give the desired result. — pyano, Jul 24 '17 at 21:39
@Divakar: sorry, I misinterpreted your question: yes, in CFD (computational fluid dynamics) with structured grids the patterns of the indices `ix` and `iy` are usually not too wild. But there are large arrays. — pyano, Jul 24 '17 at 21:44
Notice I used double indexing. Alternatively `T[(ix-1)[:,None], iy]` or let `ix_` do that for you. — hpaulj, Jul 24 '17 at 22:11
@pyano I guess in that case, you can use `slicing` : T[ix_start:ix_end:ix_stepsize,..]`. — Divakar, Jul 24 '17 at 22:19

score 2 · Accepted Answer · answered Jul 24 '17 at 23:32

The indexed version might be cleaned up with slice objects like this:

T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] + T[1:nx-1,0:ny-2]  + T[1:nx-1,2:ny] - 4.0 * T[1:nx-1,1:ny-1]

sy1 = slice(1,ny-1)
sx1 = slice(1,nx-1)
sy2 = slice(2,ny)
sy_2 = slice(0,ny-2)
T[0:nx-2,sy1] + T[2:nx,sy1] + T[sx1,xy_2]  + T[sx1,sy2] - 4.0 * T[sx1,sy1]

pyano · Answer 2 · 2017-07-25T07:41:14.220

Thanks @Divakar and @hpaulj ! Yes, working with slice is viable too. Comparing all 4 approaches gives:

fastest ex aequo: t(usual np) and t(slice)
t(take) = 2 * t(slice)
t(ix_) = 3 * t(slice)

Here the code and the results:

import numpy as np
from numpy import ix_ as r
nx = 500;    ny = 500
T = np.arange(nx*ny).reshape(nx, ny)

ix = np.arange(1,nx-1); 
iy = np.arange(1,ny-1);

jx = slice(1,nx-1); jxm = slice(0,nx-2); jxp = slice(2,nx)
jy = slice(1,ny-1); jym = slice(0,ny-2); jyp = slice(2,ny)

#------------------------------------------------------------
def p(U,kx,ky):
    return np.take(np.take(U,kx, axis=0), ky,axis=1)
#------------------------------------------------------------

%timeit ΔT_slice= -T[jxm,jy]     + T[jxp,jy]     - T[jx,jym]     + T[jx,jyp]     - 0.0 * T[jx,jy]
%timeit ΔT_npy  = -T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] - T[1:nx-1,0:ny-2]  + T[1:nx-1,2:ny] - 0.0 * T[1:nx-1,1:ny-1]
%timeit ΔT_take = -p(T,ix-1,iy)  + p(T,ix+1,iy)  - p(T,ix,iy-1)  + p(T,ix,iy+1)  - 0.0 * p(T,ix,iy)
%timeit ΔT_ix_  = -T[r(ix-1,iy)] + T[r(ix+1,iy)] - T[r(ix,iy-1)] + T[r(ix,iy+1)] - 0.0 * T[r(ix,iy)]
.
100 loops, best of 3: 3.14 ms per loop
100 loops, best of 3: 3.13 ms per loop
100 loops, best of 3: 7.03 ms per loop
100 loops, best of 3: 9.58 ms per loop

Concerning the discussion about view and copy the following might be instructive:

print("if False --> a view ;   if True --> a copy"  )
print("_slice_ :", T[jx,jy].base is None)
print("_npy_   :", T[1:nx-1,1:ny-1].base is None)
print("_take_  :", p(T,ix,iy).base is None)
print("_ix_    :", T[r(ix,iy)].base is None)
.
if False --> a view ;   if True --> a copy
_slice_ : False
_npy_   : False
_take_  : True
_ix_    : True

Is 2-dimensional numpy.take fast?

2 Answers2

Linked