4

I have a large vector field, where the field is large (e.g. 512^3; but not necessarily square) and the vectors are either 2D or 3D (e.g. shapes are [512, 512, 512, 2] or [512, 512, 512, 3]).

What is the fastest way to compute a scalar field of the squared-magnitude of the vectors?

I could just loop over each direction, i.e.

import numpy as np
shp = [256,256,256,3]                       # Shape of vector field
vf = np.arange(3*(256**3)).reshape(shp)     # Create vector field
sf = np.zeros(shp[:3])                      # Create scalar field for result

for ii in range(shp[0]):
    for jj in range(shp[1]):
        for kk in range(shp[2]):
            sf[ii,jj,kk] = np.dot( vf[ii,jj,kk,:] , vf[ii,jj,kk,:] )

but that is fairly slow, is there anything faster?

DilithiumMatrix
  • 17,795
  • 22
  • 77
  • 119
  • Is your problem similar to this? http://stackoverflow.com/questions/6824122/mapping-a-numpy-array-in-place – Garth5689 Nov 08 '13 at 16:32

1 Answers1

6

The fastest is probably going to be np.einsum:

np.einsum('...j,...j->...', vf, vf)

The above code tells numpy to grab its to inputs and reduce the last dimension of each by multiplying corresponding values and adding them together. With your dataset there is a problem of overflow, since the magnitudes will not fit in a 32 bit integer, which is the default return of np.arange. You can solve that by specifying the return dtype, as either np.int64 or np.double:

>>> np.einsum('...j,...j->...', vf,vf)[-1, -1, -1]
-603979762
>>> np.einsum('...j,...j->...', vf,vf).dtype
dtype('int32')

>>> np.einsum('...j,...j->...', vf,vf, dtype=np.int64)[-1, -1, -1]
7599823767207950
>>> np.einsum('...j,...j->...', vf,vf, dtype=np.double)[-1, -1, -1]
7599823767207950.0
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • Thanks @Jaime, can you include an explanation or comment on what that means and/or why it is optimal? – DilithiumMatrix Nov 08 '13 at 16:37
  • Wow. `einsum` just made my week. – DilithiumMatrix Nov 08 '13 at 16:58
  • 2
    An alternate - perhaps more explicit - way to do this is using `np.sum(vf * vf, axis=-1)`. However, a quick test (for the given array size) shows that the `einsum` version is over twice as fast (I'm guessing because it avoids using a temporary `vf * vf` array). – bogatron Nov 08 '13 at 17:14
  • @bogatron Einsum can use SSE2 while numpy ufuncs do not in 1.7, see [here](http://stackoverflow.com/questions/18365073/why-is-numpys-einsum-faster-than-numpys-built-in-functions). – Daniel Nov 08 '13 at 23:55
  • @Ophion Thanks for the link. That is very informative. Please correct me if you interpret this differently but the results you show in the link suggest to me that my comment above is still valid since the question posed here is closest to your last example, for which einsum is still much faster than the ufunc solution (even with SSE2). – bogatron Nov 10 '13 at 03:43
  • @bogatron Einsum does avoid most intermediates which makes it nice on memory, but the speed comes from SSE. Does this answer your question? – Daniel Nov 10 '13 at 04:20