numpy: efficient execution of a complex reshape of an array

Question

I am reading a vendor-provided large binary array into a 2D numpy array tempfid(M, N)

# load data
data=numpy.fromfile(file=dirname+'/fid', dtype=numpy.dtype('i4'))

# convert to complex data
fid=data[::2]+1j*data[1::2]

tempfid=fid.reshape(I*J*K, N)

and then I need to reshape it into a 4D array useful4d(N,I,J,K) using non-trivial mappings for the indices. I do this with a for loop along the following lines:

for idx in range(M):
    i=f1(idx) # f1, f2, and f3 are functions involving / and % as well as some lookups
    j=f2(idx)
    k=f3(idx)
    newfid[:,i,j,k] = tempfid[idx,:] #SLOW! CAN WE IMPROVE THIS?

Converting to complex takes 33% of the time while the copying of these slices M slices takes the remaining 66%. Calculating the indices is fast irrespective of whether I do this one by one in a loop as shown or by numpy.vectorizing the operation and applying it to an arange(M).

Is there a way to speed this up? Any help on more efficient slicing, copying (or not) etc appreciated.

EDIT: As learned in the answer to question "What's the fastest way to convert an interleaved NumPy integer array to complex64?" the conversion to complex can be sped up by a factor of 6 if a view is used instead:

 fid = data.astype(numpy.float32).view(numpy.complex64)

have you tried vectorizing the calculation of i,j,k and then use the resulting array to make the copy in a single line? — Winston Ewert, Mar 24 '11 at 16:26
@Winston Ewert: This is where I might have failed. I was able to vectorize the calculation of i,j,k and create vec_f1=numpy.vectorize(lambda x: f1(x)) and get i_idx=vec_f1(idx) etc.- However, I could not come up with a one-line operation for the array: vec_assign=vectorize(lambda idx:newfid[ *** ]=tempfid[***]) gives an error since 'lambda cannot contain assignment' — DrSAR, Mar 24 '11 at 16:36
also if you are using Python 2.x, and M is large, you should consider using `xrange` instead of `range` if you are going to be looping, just as a general rule. — JoshAdel, Mar 24 '11 at 16:59

JoshAdel · Answer 1 · 2011-03-24T17:14:13.770

2

How about this. Set us your indicies using the vectorized versions of f1,f2,f3 (not necessarily using np.vectorize, but perhaps just writing a function that takes an array and returns an array), then use np.ix_:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html

to get the index arrays. Then reshape tempfid to the same shape as newfid and then use the results of np.ix_ to set the values. For example:

tempfid = np.arange(10)
i = f1(idx) # i = [4,3,2,1,0]
j = f2(idx) # j = [1,0]
ii = np.ix_(i,j)
newfid = tempfid.reshape((5,2))[ii]

This maps the elements of tempfid onto a new shape with a different ordering.

edited Mar 24 '11 at 17:14

answered Mar 24 '11 at 16:51

JoshAdel

66,734
27
141
140

@JoshAdel: this looks promising, however I get a 'ValueError: broadcast dimensions too large.' Is this an indication of me screwing up the np.ix_ business or are there limits? I am dealing with a 128 x 64 x 1 x 1200 array of complex numbers – DrSAR Mar 24 '11 at 17:40
@DrSAR: I can pretty easily create that sized empty complex array on my machine. And then if I do `h = np.empty((128,64,1,1200),dtype=complex); a = np.arange(h.size); a = a+1j*a; ii = np.ix_(range(128),range(64),range(1),range(1200)); h = a.reshape(h.shape)[ii]` everything works (sorry for jamming this in a continuous line). You may be making a mistake with `np.ix_` – JoshAdel Mar 24 '11 at 17:53
@JoshAdel: I probably am. Your version in the comment works for me. However, when I let my implementation of ix_ lose on a case of 128 x 64 x 1 x 1 it works without ValueError but it is surprisingly slow compared to the straightforward loop. In fact, about a factor of 3000. I also notice your operation takes about 1.2s on my machine (no ValueError, I agree) with most of it spent in the array assignment using the index. Is ix_ supposed to be convenient or fast? Or maybe both? – DrSAR Mar 24 '11 at 18:05
@DrSAR: Python for loops are generally slow (in reference to your comment on Winston's post), and sometimes indexing tricks can be much faster. But obviously in this case there is a large overhead incurred by a reshape or something else going on behind the scenes. If these indexing tricks aren't giving you better performance than a for loop, you might consider using Cython to do the conversion. – JoshAdel Mar 24 '11 at 18:48
@JoshAdel: Thanks very much for the continued help - I have the feeling I have now tried & exhausted most options in plain python/numpy and am at the performance limit that I can get without resorting to externalizing this critical bit. And in the process I've learned a lot; Thanks. – DrSAR Mar 24 '11 at 18:57
@DrSAR: no problem and sorry I couldn't be of more help. I'm a little embarrassed that my method was 3000x slower; usually I offer up 2-5x slower solutions :) – JoshAdel Mar 24 '11 at 19:23

Winston Ewert · Accepted Answer · 2011-03-24T17:54:57.923

2

idx = numpy.arange(M)
i = numpy.vectorize(f1)(idx)
j = numpy.vectorize(f2)(idx)
k = numpy.vectorize(f3)(idx)

# you can index arrays with other arrays
# that lets you specify this operation in one line.    
newfid[:, i,j,k] = tempfid.T

I've never used numpy's vectorize. Vectorize just means that numpy will call your python function multiple times. In order to get speed, you need use array operations like the one I showed here and you used to get complex numbers.

EDIT

The problem is that the dimension of size 128 was first in newfid, but last in tempfid. This is easily by using .T which takes the transpose.

edited Mar 24 '11 at 17:54

answered Mar 24 '11 at 17:00

Winston Ewert

44,070
10
68
83

1

I don't think that last line will work. For example `i = [1,0]; j = [0,1]; b = np.zeros((2,2)); a = np.arange(4); b[i,j] = a[a]` gives you a broadcasting error. – JoshAdel Mar 24 '11 at 17:10
@Winston Ewert: Using this I get a 'ValueError: array is not broadcastable to correct shape'. Note that idx,i,j,k all have the same 1D length and (to my eyes) properly address their respective dimensions. – DrSAR Mar 24 '11 at 17:12
@JoshAdel, that fails because len(i) != len(a) – Winston Ewert Mar 24 '11 at 17:22
@DrSar, print array.shape for all of the arrays in question and tell me what it is – Winston Ewert Mar 24 '11 at 17:23
@Winston Ewert: shape for the four dimension is 1D but all have same length: i=(76800,), j=(76800,), k=(76800,), idx=(76800,) The max in each dimension don't exceed what they shouldn't: MAX i=63, j=0, k=1199, idx=76799. The shape for the arrays we are working on are newfid=(128, 64, 1, 1200) and tempfid=(76800, 128) – DrSAR Mar 24 '11 at 17:32
@Winston Ewert: Thanks - it now works. Disappointingly it is just as fast/slow as the manual for loop. Could it be that Python is just quite efficient about for loops and it's hard to improve on it? – DrSAR Mar 24 '11 at 18:13
I show a slight improvement using this over the manual loop. Since each entry of your original loop already copies 128 elements the overhead in python isn't actually that big. Unless there is some cleverness that can be done by looking at your f1, f2, f3 functions, you probably can't improve the speed much. – Winston Ewert Mar 24 '11 at 18:24
@Winston Ewert: Thanks for your help. I can now sleep better knowing that I am not making severe speed blunders. Since f1, f2, f3 are not predictable (they are partly looked up from vendor header files), I doubt I can come up with some cleverness. What I have now is probably as good as it gets. In summary, numpy.ix_ (a la Josh Adel in the other answer) is neat and makes for compact looking code. array operations using arrays as indexes (a la Winston Ewert) also give you intuitive looking code. Neither, however, in my typical use case beat the for loop but are very close. Thanks for your help. – DrSAR Mar 24 '11 at 19:04
@DrSar, to be clear, if you use a for loop on individual elements, then using the array operations will be much faster. – Winston Ewert Mar 24 '11 at 19:05
@Winston Ewert My apologies: Earlier when I tried the method of vectorizing and using array-based indices to access arrays I must have made a timing error. I now get a 33% overall improvement over the for loop. Thanks - I will accept as correct. – DrSAR Mar 30 '11 at 23:35

numpy: efficient execution of a complex reshape of an array

2 Answers2

Linked