1

cArr is an array of the form:

cArr=np.array([[0,x0,y0,z0,1],[1,x1,y1,z1,1]])

The middle three numbers of each row represent the coordinates of two points, (points 0 and 1 for reference) in 3D. The first and last values in each row are used by other functions.

cEDA is a 4D array of shape (100,100,100,4), filled with essentially random numbers.

For each point 0 and 1 there is a neighbourhood of points around each point where the coordinates differs by only one, in the 6 cardinal directions.

x0,y0,z0
x0+1,y0,z0
x0,y0+1,z0
x0,y0,z0+1
x0-1,y0,z0
x0,y0-1,z0
x0,y0,z0-1

The distance between these 7 points relating to point 0 and point 1 (so the distance between point 1 and the neighbourhood of points around point 0 and point 0 itself) are calculated and combined with the corresponding value in the -2th (100,100,100) array in cEDA and placed in the ith 100,100,100 cube in place in cEDA where i is point 0 or 1 respectively.

I am new to python from a C background and am struggling to get out of the loop mindset. I have tried precalculating distances and summing arrays but is much slower but the quickest way so far is going through each individual value and assigning it. The function is run MANY times and is a bottleneck in the code and am sure there is a better way to do it.

I conceptually understand and can use NumPy but cannot seem to apply it in this case - any ideas on how to speed this function up by using NumPy is the correct way? I've tried my best to explain the functionality of this function sorry I know it's confusing!

def cF2(cArr, cEDA):
    # the coordinates of points 0 and 1 for readability
    x0 = cArr[0,1]
    y0 = cArr[0,2]
    z0 = cArr[0,3]
    x1 = cArr[1,1]
    y1 = cArr[1,2]
    z1 = cArr[1,3]

    # for each point around point 0 and the point itself, calculate the distance between this point and point 1
    # use this value and the corresponding value in cEDA(x,y,z,-2) and place result in  cEDA(x,y,z,0)
    cEDA[x0,y0,z0,0] = cEDA[x0,y0,z0,-2]-0.4799/(np.linalg.norm([x0,y0,z0]-cArr[1,1:4]))
    cEDA[x0-1,y0,z0,0] = cEDA[x0-1,y0,z0,-2]-0.4799/(np.linalg.norm([x0-1,y0,z0]-cArr[1,1:4]))
    cEDA[x0+1,y0,z0,0] = cEDA[x0+1,y0,z0,-2]-0.4799/(np.linalg.norm([x0+1,y0,z0]-cArr[1,1:4]))
    cEDA[x0,y0-1,z0,0] = cEDA[x0,y0-1,z0,-2]-0.4799/(np.linalg.norm([x0,y0-1,z0]-cArr[1,1:4]))
    cEDA[x0,y0+1,z0,0] = cEDA[x0,y0+1,z0,-2]-0.4799/(np.linalg.norm([x0,y0+1,z0]-cArr[1,1:4]))
    cEDA[x0,y0,z0-1,0] = cEDA[x0,y0,z0-1,-2]-0.4799/(np.linalg.norm([x0,y0,z0-1]-cArr[1,1:4]))
    cEDA[x0,y0,z0+1,0] = cEDA[x0,y0,z0+1,-2]-0.4799/(np.linalg.norm([x0,y0,z0+1]-cArr[1,1:4]))

    cEDA[x1,y1,z1,1] = cEDA[x1,y1,z1,-2]+0.4799/(np.linalg.norm([x1,y1,z1]-cArr[0,1:4]))
    cEDA[x1-1,y1,z1,1] = cEDA[x1-1,y1,z1,-2]+0.4799/(np.linalg.norm([x1-1,y1,z1]-cArr[0,1:4]))
    cEDA[x1+1,y1,z1,1] = cEDA[x1+1,y1,z1,-2]+0.4799/(np.linalg.norm([x1+1,y1,z1]-cArr[0,1:4]))
    cEDA[x1,y1-1,z1,1] = cEDA[x1,y1-1,z1,-2]+0.4799/(np.linalg.norm([x1,y1-1,z1]-cArr[0,1:4]))
    cEDA[x1,y1+1,z1,1] = cEDA[x1,y1+1,z1,-2]+0.4799/(np.linalg.norm([x1,y1+1,z1]-cArr[0,1:4]))
    cEDA[x1,y1,z1-1,1] = cEDA[x1,y1,z1-1,-2]+0.4799/(np.linalg.norm([x1,y1,z1-1]-cArr[0,1:4]))
    cEDA[x1,y1,z1+1,1] = cEDA[x1,y1,z1+1,-2]+0.4799/(np.linalg.norm([x1,y1,z1+1]-cArr[0,1:4]))
    return cEDA
usethedeathstar
  • 2,219
  • 1
  • 19
  • 30
  • one of the things i would do is if you multiply (1.60217657e-19)*2995850595.79/(something*1e-9), is to actually have that number as one instead of as 3 parts (faster (negligible difference though), but also improves the readability of the code) – usethedeathstar Feb 04 '14 at 10:25
  • Yeah you are of course correct there! It's just me working on this and am still at a very developmental stage - will update it now for readability –  Feb 04 '14 at 10:29
  • (fixed indentation) another thing you can change, to make it perhaps more readable, is instead of writing cArr[0,1:4], write x0,y0,z0 (and calculate that norm in one calculation, for each of those coordinate combinations, instead of calling linalg.norm 14 times, call it once (with some broadcasting it should be possible to do, or with axis keyword or so) – usethedeathstar Feb 04 '14 at 10:49
  • Getting the most out of NumPy may require looking at the overall calculation in a different way. Instead of optimizing this one function (while that might be possible), there might be a bigger gain by replacing the thousands of calls to `cF2` (in a Python loop!) with some larger array-based NumPy equation. – unutbu Feb 04 '14 at 10:51
  • It's a Monte Carlo process so have many thousands of iterations and each one depends on the outcome of the previous iteration (for interest this represents charges moving through a lattice) –  Feb 04 '14 at 10:55
  • I did try calling linalg.norm once for the series of points but couldn't get it to work right, will give it another go... –  Feb 04 '14 at 10:57
  • NumPy works best when it can perform relatively few operations on large arrays. It doesn't do as well when you perform lots of operations on small arrays. For example, calling `linalg.norm` on (3,1) arrays millions of times will kill performance. Even if you make it (3,7) the performance will be bad. If your calculation is inherently sequential (depending on the previous outcome), then it is better to keep the calculation in C and [call the C function from Python](http://docs.python.org/2/extending/extending.html) or use [Cython](http://cython.org/) to write faster loops with Python-like code. – unutbu Feb 04 '14 at 11:16
  • Reassigning values to a few points in a large array (as one would naturally do in C) is also a NumPy performance killer. – unutbu Feb 04 '14 at 11:19
  • It would be quite involved writing the entire loop out in C/Cython - might has well have not used python! But would writing this single function in Cython be worth it do you think? As in the speedup wouldn't be hurt by overhead or something? –  Feb 04 '14 at 11:24
  • 1
    Cython would be a definite improvement, but by how much I'm not sure. I think that might be your best option since it would not require much change to your current Python code. I would suggest [replacing the calls to linalg.norm](http://stackoverflow.com/q/7741878/190597) with explicit multiplication, sums and square roots, and saving the results in temporary variables (you are currently calculating each of them twice). Also, for scalars, `math.sqrt` is significantly faster than `np.sqrt`. – unutbu Feb 04 '14 at 11:38

2 Answers2

0

If you want to call this a lot of time, you need to convert the whole for loop in to a C-loop by numba or Cython. However there are some method to increase the speed of your function, here is your original function:

def cF2(cArr, cEDA):
    # the coordinates of points 0 and 1 for readability
    x0 = cArr[0,1]
    y0 = cArr[0,2]
    z0 = cArr[0,3]
    x1 = cArr[1,1]
    y1 = cArr[1,2]
    z1 = cArr[1,3]

    # for each point around point 0 and the point itself, calculate the distance between this point and point 1
    # use this value and the corresponding value in cEDA(x,y,z,-2) and place result in  cEDA(x,y,z,0)
    cEDA[x0,y0,z0,0] = cEDA[x0,y0,z0,-2]-0.4799/(np.linalg.norm([x0,y0,z0]-cArr[1,1:4]))
    cEDA[x0-1,y0,z0,0] = cEDA[x0-1,y0,z0,-2]-0.4799/(np.linalg.norm([x0-1,y0,z0]-cArr[1,1:4]))
    cEDA[x0+1,y0,z0,0] = cEDA[x0+1,y0,z0,-2]-0.4799/(np.linalg.norm([x0+1,y0,z0]-cArr[1,1:4]))
    cEDA[x0,y0-1,z0,0] = cEDA[x0,y0-1,z0,-2]-0.4799/(np.linalg.norm([x0,y0-1,z0]-cArr[1,1:4]))
    cEDA[x0,y0+1,z0,0] = cEDA[x0,y0+1,z0,-2]-0.4799/(np.linalg.norm([x0,y0+1,z0]-cArr[1,1:4]))
    cEDA[x0,y0,z0-1,0] = cEDA[x0,y0,z0-1,-2]-0.4799/(np.linalg.norm([x0,y0,z0-1]-cArr[1,1:4]))
    cEDA[x0,y0,z0+1,0] = cEDA[x0,y0,z0+1,-2]-0.4799/(np.linalg.norm([x0,y0,z0+1]-cArr[1,1:4]))

    cEDA[x1,y1,z1,1] = cEDA[x1,y1,z1,-2]+0.4799/(np.linalg.norm([x1,y1,z1]-cArr[0,1:4]))
    cEDA[x1-1,y1,z1,1] = cEDA[x1-1,y1,z1,-2]+0.4799/(np.linalg.norm([x1-1,y1,z1]-cArr[0,1:4]))
    cEDA[x1+1,y1,z1,1] = cEDA[x1+1,y1,z1,-2]+0.4799/(np.linalg.norm([x1+1,y1,z1]-cArr[0,1:4]))
    cEDA[x1,y1-1,z1,1] = cEDA[x1,y1-1,z1,-2]+0.4799/(np.linalg.norm([x1,y1-1,z1]-cArr[0,1:4]))
    cEDA[x1,y1+1,z1,1] = cEDA[x1,y1+1,z1,-2]+0.4799/(np.linalg.norm([x1,y1+1,z1]-cArr[0,1:4]))
    cEDA[x1,y1,z1-1,1] = cEDA[x1,y1,z1-1,-2]+0.4799/(np.linalg.norm([x1,y1,z1-1]-cArr[0,1:4]))
    cEDA[x1,y1,z1+1,1] = cEDA[x1,y1,z1+1,-2]+0.4799/(np.linalg.norm([x1,y1,z1+1]-cArr[0,1:4]))
    return cEDA

np.random.seed(42)
cArr = np.random.randint(0, 100, (2, 5))
cEDA = np.random.rand(100, 100, 100, 4)
r1 = cF2(cArr, cEDA)

Here is the optimized function:

tmp = np.eye(3)
offsets = np.concatenate((tmp, -tmp, [[0, 0, 0]]), 0).astype(int)[:, None, :]

def fast_cF2(cArr, cEDA):
    cArr = cArr[:, 1:4]
    t = cArr[None,:,:] + offsets
    X0, Y0, Z0 = t[:, 0, :].T
    X1, Y1, Z1 = t[:, 1, :].T

    d1 = 0.4799/np.linalg.norm(t[:, 0, :] - cArr[1], axis=1)
    d0 = 0.4799/np.linalg.norm(t[:, 1, :] - cArr[0], axis=1)

    cEDA[X0, Y0, Z0, 0] = cEDA[X0, Y0, Z0, -2] - d1
    cEDA[X1, Y1, Z1, 1] = cEDA[X1, Y1, Z1, -2] + d0
    return cEDA

np.random.seed(42)
cArr = np.random.randint(0, 100, (2, 5))
cEDA = np.random.rand(100, 100, 100, 4)
r2 = fast_cF2(cArr, cEDA)
print np.allclose(r1, r2)

the timeit result:

%timeit cF2(cArr, cEDA)
%timeit fast_cF2(cArr, cEDA)

output:

1000 loops, best of 3: 320 µs per loop
10000 loops, best of 3: 91.6 µs per loop
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • Just... wow... Thank you so much - I will continue to work on Cythonizing the loop but this optimisation is incredible! I will try to work my way through how it functions, though it may be beyond me! –  Feb 04 '14 at 14:29
0

I think that scipy.ndimage.generic_filter would do it...

def neighborly_function(in_arr1d):
    # put your equation below - will have to figure out the details
    # below is just an example
    # has to return a scalar
    return sum(in_arr1d)

import scipy.ndimage as nd

huge_arr3d = [....]

# Since you want the cardinal directions you have to 'footprint'
footprint = np.array([[[False, False, False],
    [False,  True, False],
    [False, False, False]],

   [[False,  True, False],
    [ True, False,  True],
    [False,  True, False]],

   [[False, False, False],
    [False,  True, False],
    [False, False, False]]])

out = nd.generic_filter(huge_arr3d, neighborly_function, footprint=footprint)

Check out the ndimage docs at http://docs.scipy.org/doc/scipy/reference/tutorial/ndimage.html and generic_filter at http://docs.scipy.org/doc/scipy/reference/tutorial/ndimage.html#generic-filter-functions

TimCera
  • 574
  • 3
  • 8