4

I wish to optimise some python code consisting of two nested loops. I am not so familar with numpy, but I understand it should enable me to improve the efficiency for such a task. Below is a test code I wrote that reflects what happens in the actual code. Currently using the numpy range and iterator is slower than the usual python one. What am I doing wrong? What is the best solution to this problem?

Thanks for your help!

import numpy
import time

# setup a problem analagous to that in the real code
npoints_per_plane = 1000
nplanes = 64
naxis = 1000
npoints3d = naxis + npoints_per_plane * nplanes
npoints = naxis + npoints_per_plane
specres = 1000

# this is where the data is being mapped to
sol = dict()
sol["ems"] = numpy.zeros(npoints3d)
sol["abs"] = numpy.zeros(npoints3d)

# this would normally be non-random input data
data = dict()
data["ems"] = numpy.zeros((npoints,specres))
data["abs"] = numpy.zeros((npoints,specres))
for ip in range(npoints):
    data["ems"][ip,:] = numpy.random.random(specres)[:]
    data["abs"][ip,:] = numpy.random.random(specres)[:]
ems_mod = numpy.random.random(1)[0]
abs_mod = numpy.random.random(1)[0]
ispec = numpy.random.randint(specres)

# this the code I want to optimize

t0 = time.time()

# usual python range and iterator
for ip in range(npoints_per_plane):
    jp = naxis + ip
    for ipl in range(nplanes):
        ip3d = jp + npoints_per_plane * ipl
        sol["ems"][ip3d] = data["ems"][jp,ispec] * ems_mod
        sol["abs"][ip3d] = data["abs"][jp,ispec] * abs_mod

t1 = time.time()

# numpy ranges and iterator
ip_vals = numpy.arange(npoints_per_plane)
ipl_vals = numpy.arange(nplanes)
for ip in numpy.nditer(ip_vals):
    jp = naxis + ip
    for ipl in numpy.nditer(ipl_vals):
        ip3d = jp + npoints_per_plane * ipl
        sol["ems"][ip3d] = data["ems"][jp,ispec] * ems_mod
        sol["abs"][ip3d] = data["abs"][jp,ispec] * abs_mod


t2 = time.time()

print "plain python: %0.3f seconds" % ( t1 - t0 )
print "numpy: %0.3f seconds" % ( t2 - t1 )

edit: put "jp = naxis + ip" in the first for loop only

additional note:

I worked out how to get numpy to quickly do the inner loop, but not the outer loop:

# numpy vectorization
for ip in xrange(npoints_per_plane):
    jp = naxis + ip
    sol["ems"][jp:jp+npoints_per_plane*nplanes:npoints_per_plane] = data["ems"][jp,ispec] * ems_mod
    sol["abs"][jp:jp+npoints_per_plane*nplanes:npoints_per_plane] = data["abs"][jp,ispec] * abs_mod

Joe's solution below shows how to do both together, thanks!

user1334640
  • 101
  • 1
  • 5
  • I'm not acquainted with `numpy.range`, but using Python's `range` is the same as creating a list of `n` elements, whereas, `xrange` is a generator of numbers -- avoiding storage of an entire list. – Rubens Jul 18 '13 at 14:54
  • Thanks. You are right, but when I use xrange instead of range there is no change in the runtime. – user1334640 Jul 18 '13 at 15:12
  • 2
    The key to writing fast numpy code is *avoiding* loops by vectorizing operations (or, rather, pushing the loops down to the fast C level), not that iterating over numpy objects is faster. IOW, idiomatic numpy code doesn't have many `for` loops, but acts on vectors and arrays as a whole. – DSM Jul 18 '13 at 15:14
  • 2 questions. 1) Is the size of the arrays the same as for your true application? 2) How fast do you need the code to run? – lightalchemist Jul 18 '13 at 15:17
  • 1) for a small problem, yes. For the largest problem, it would be more like npoints_per_plane = 50000, nplanes = 64, naxis = 1000, specres = 1000. 2) I want the code to run in a similar order of magnitude to c. I have just spent a week converting a swig interface to pure python but now it takes twice as long for a small problem, and this is the bottleneck. – user1334640 Jul 18 '13 at 15:22
  • It seems like a simple transform with a constant multiplication for all values. It might be faster to simply write a smart indexer unless you absolutely need the data in a linear array i.e. make 'sol' a class with access operators that access 'data' underneath – user1827356 Jul 18 '13 at 15:37
  • sol is actually a variable in a netcdf file that I am interacting with via netCDF4, so I am currently stuck with a linear array. – user1334640 Jul 18 '13 at 15:44

1 Answers1

6

The best way of writing loops in numpy is not writing loops and instead using vectorized operations. For example:

c = 0
for i in range(len(a)):
    c += a[i] + b[i]

becomes

c = np.sum(a + b, axis=0)

For a and b with a shape of (100000, 100) this takes 0.344 seconds in the first variant, and 0.062 seconds in the second.

In the case presented in your question the following does what you want:

sol['ems'][naxis:] = numpy.ravel(
    numpy.repeat(
        data['ems'][naxis:,ispec,numpy.newaxis] * ems_mod,
        nplanes,
        axis=1
    ),
    order='F'
)

This could be further optimized with some tricks, but that would reduce clarity and is probably premature optimization because:

plain python: 0.064 seconds

numpy: 0.002 seconds

The solution works as follows:

Your original version contains jp = naxis + ip which merely skips the first naxis elements [naxis:] selects all but the first naxis elements. Your inner loop repeats the value of data[jp,ispec] for nplanes times and writes it to multiple locations ip3d = jp + npoints_per_plane * ipl which is equivalent to a flattened 2D array offset by naxis. Therefore a second dimension is added via numpy.newaxis to the (previously 1D) data['ems'][naxis:, ispec], the values are repeated nplanes times along this new dimension via numpy.repeat. The resulting 2D array is then flattened again via numpy.ravel (in Fortran order, i.e., with the lowest axis having the smallest stride) and written to the appropriate subarray of sol['ems']. If the target array was actually 2D, the repeat could be skipped by using automatic array broadcasting.

If you run into a situation where you cannot avoid using loops, you could use Cython (which supports efficient buffer views on numpy arrays).

Community
  • 1
  • 1
Joe
  • 6,497
  • 4
  • 29
  • 55
  • thanks, yeah I am reading http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html to try and work out if I can vectorize my loop. The problem is it is not a 1:1 mapping so perhaps I need the loop still. And can I put cython code in a python script? Not sure how that works... – user1334640 Jul 18 '13 at 15:32
  • 1
    Actually the code does look like a one to one mapping, except that it is [`reshape`d](http://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). Cython code can be used inline via [`cython.inline`](http://docs.cython.org/src/reference/compilation.html#compiling-with-cython-inline). – Joe Jul 18 '13 at 15:40
  • how does one vectorize "for i in range(len(a)): c += a[i] + b[2*i]" for example? this would help me understand what to do. – user1334640 Jul 18 '13 at 15:47
  • 2
    `c = numpy.sum(a + b[::2])` – Joe Jul 18 '13 at 15:49
  • okay, I see the syntax now ([i:j:k] where i=start, j=stop, k=increment), but I cannot work out how to apply this my loop. how does one vectorize "for ipl in xrange(nplanes): sol["ems"][jp + npoints_per_plane * ipl] = data["ems"][jp,ispec] * ems_mod" for example? – user1334640 Jul 18 '13 at 16:07
  • thanks joe, your solution is both neat and fast (and correct, determined via assert) - exactly why I want to use python! now I just need to work out how your solution works! – user1334640 Jul 19 '13 at 10:59
  • @user1334640 added some more explanation, hope it helps – Joe Jul 19 '13 at 11:21
  • @user1334640 Is more clarification required? If not it would be appropriate to accepted the answer. – Joe Jul 22 '13 at 14:51