Here are various ways of doing this, with accompanying benchmarks:
a = np.zeros([100,200,300,5])
b = np.zeros([100,200,300,1])
%timeit c=np.concatenate([a,b],-1)
#1 loops, best of 3: 241 ms per loop
%timeit c=np.vstack([a.T,b.T]).T
#1 loops, best of 3: 309 ms per loop
%timeit c=np.empty([100,200,300,5]); c[...,:5]=a; c[...,5:]=b
#1 loops, best of 3: 311 ms per loop
# Assuming c was already allocated:
%timeit c[...,:5]=a; c[...,5:]=b
#10 loops, best of 3: 113 ms per loop
These times are all quite comparable, and all quite slow. If all the arrays were in the transposed order, we could do a bit better:
va = np.zeros([5,300,200,100])
vb = np.zeros([1,300,200,100])
%timeit vc=np.concatenate([va,vb],0)
#1 loops, best of 3: 191 ms per loop
%timeit vc=np.vstack([va,vb])
#1 loops, best of 3: 284 ms per loop
%timeit vc=np.empty([6,300,200,100]); vc[:5]=va; vc[5:]=vb
#1 loops, best of 3: 281 ms per loop
#Assuming vc is already allocated. This case is somehow
#much faster than the others!
%timeit vc[:5]=va; vc[5:]=vb
#10 loops, best of 3: 26.4 ms per loop
#Somehow the time for allocating vc and for copying the
#values does not add up. I guess this has to do with
#caching working better when the same buffer is reused
%timeit vc=np.empty([6,300,200,100])
#100000 loops, best of 3: 7.73 µs per loop
Implementing the same operation in fortran and calling it via f2py produced times of about 55 ms just for the assignment for the untransposed case. So it seems none of these options are horribly inefficient. I would recommend np.concatenate
. It is general, and slightly faster than the equivalent *stack
for some reason. That is, unless you can preallocate and reuse the output array, in which case assignment with broadcasing is faster by at least a factor 2.