6

Consider the following code using numpy arrays which is very slow :

# Intersection of an octree and a trajectory
def intersection(octree, trajectory):
    # Initialize numpy arrays
    ox = octree.get("x")
    oy = octree.get("y")
    oz = octree.get("z")
    oe = octree.get("extent")/2
    tx = trajectory.get("x")
    ty = trajectory.get("y")
    tz = trajectory.get("z")
    result = np.zeros(np.size(ox))
    # Loop over elements
    for i in range(0, np.size(tx)):
        for j in range(0, np.size(ox)):
            if (tx[i] > ox[j]-oe[j] and 
                tx[i] < ox[j]+oe[j] and 
                ty[i] > oy[j]-oe[j] and 
                ty[i] < oy[j]+oe[j] and 
                tz[i] > oz[j]-oe[j] and 
                tz[i] < oz[j]+oe[j]):
                result[j] += 1
    # Finalize
    return result

How to rewrite the function to speed up the calculation ? (np.size(tx) == 10000 and np.size(ox) == 100000)

Ali
  • 56,466
  • 29
  • 168
  • 265
Vincent
  • 57,703
  • 61
  • 205
  • 388
  • Do you also consider using OpenCL? –  Jun 05 '14 at 23:00
  • I do not need full performance, I just want a raw speed up. – Vincent Jun 05 '14 at 23:03
  • 1
    Build a `scipy.spatial.KDTree` from the points tx, ty, tz and then use nearest-neighbour look-up in the infinity norm for each point in ox, oy, oz to see whether there is any point close enough. – Sven Marnach Jun 05 '14 at 23:07
  • Have you considered using Cython? I have read that it gives large speedups without much pain. http://stackoverflow.com/questions/7799977/numpy-vs-cython-speed – steveha Jun 05 '14 at 23:14

3 Answers3

6

You are allocating 10000 lists of size 100000. The first thing to do would be to stop using range for the nested j loop and use the generator version xrange instead. This will save you time and space allocating all those lists.

The next one would be to use vectorized operations:

for i in xrange(0, np.size(tx)):
    index = (ox-oe < tx[i]) & (ox+oe > tx[i]) & (oy-oe < ty[i]) & (oy+oe > ty[i]) & (oz-oe < tz[i]) & (oz+oe > tz[i])
    result[index] += 1  
Oleg Sklyar
  • 9,834
  • 6
  • 39
  • 62
  • Give me a sec, there is the next one coming :) – Oleg Sklyar Jun 05 '14 at 23:24
  • Ok, the vectorized form is in (may need checking of conditions), but as you modified your question in response to my original answer that part of the answer became irrelevant even though it would still be a good initial step to do in case of non-bumpy data :) – Oleg Sklyar Jun 05 '14 at 23:31
0

You're likely to get good results by running this code under PyPy: http://pypy.org/ (instructions for our NumPy integration at https://bitbucket.org/pypy/numpy)

Alex Gaynor
  • 14,353
  • 9
  • 63
  • 113
0

I think this will give the same result for the double loop and be faster:

for j in xrange(np.size(ox)):
    result[j] += sum( abs(tx-ox[j])<oe[j] & abs(ty-oy[j])<oe[j] & abs(tz-oz[j])<oe[j] )

To get this: 1) reorder the loops (ie, swap them) which is valid since nothing changes within the loops; 2) pull result[j] outside the i loop; 3) convert all the t>ox-oe and t<ox+oe to abs(t-ox)<oe (though this may not be a huge speedup, it's easier to read).

Since you don't have runnable code, and I didn't want to build a test for this, I'm not 100% sure this is correct.

tom10
  • 67,082
  • 10
  • 127
  • 137