TLDR; I'm performing an array operation (no mathematics) and I've found Cython to be significantly faster. Is there a way I can speed this up in NumPy; or Cython?
Context
I'm writing a function that is meant to take a subset of an NxN
array from index
onward in both directions (whose top corner is along the diagonal) and shift it one place upwards along the diagonal. Secondly, I need to shift the top row from index
onward one place to the left. Lastly, I need to set the last column in the array to zero after the operation.
The array is a strictly upper triangular matrix meaning that everything from the diagonal downwards is set to 0. This is my attempt at an elegant way to store historical collision data between pairs of objects (whose indices are represented by indices in the matrix). This would be similar to making a nested list of size n!/(2(n-2)!)
which represents the ordered pairs of a list of indices of length n
. In this algorithm, I hope to "remove" an object from the collision pairing matrix.
The advantage I find in this implementation is that "removing a collision pair" from the matrix is much less computationally intensive than removing pairs from a nested list and shifting the indices in pairs past the "index to remove" point.
The overall project centers around the automated "packing" of 3D models into a build volume for powder bed fusion additive manufacturing. The algorithm uses simulated annealing, so the ability to prune a collision set, store historical information, add/remove geometry are of upmost importance and need to be well optimized.
Example
Lets say our array takes this form (not representative of actual data).
arr =
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0. 0. 2. 3. 4. 5. 6. 7. 8. 9.]
[0. 0. 0. 3. 4. 5. 6. 7. 8. 9.]
[0. 0. 0. 0. 4. 5. 6. 7. 8. 9.]
[0. 0. 0. 0. 0. 5. 6. 7. 8. 9.]
[0. 0. 0. 0. 0. 0. 6. 7. 8. 9.]
[0. 0. 0. 0. 0. 0. 0. 7. 8. 9.]
[0. 0. 0. 0. 0. 0. 0. 0. 8. 9.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 9.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
Then using index = 3
we should take everything in the subset index+1:n, index+1:n
and set it equal to index:n-1, index:n-1
. Then shift the top row to the left; again from index
onward. Then set the last column to 0.
fun(3, arr)
[[0. 1. 2. 4. 5. 6. 7. 8. 9. 0.]
[0. 0. 2. 3. 4. 5. 6. 7. 8. 0.]
[0. 0. 0. 3. 4. 5. 6. 7. 8. 0.]
[0. 0. 0. 0. 5. 6. 7. 8. 9. 0.]
[0. 0. 0. 0. 0. 6. 7. 8. 9. 0.]
[0. 0. 0. 0. 0. 0. 7. 8. 9. 0.]
[0. 0. 0. 0. 0. 0. 0. 8. 9. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 9. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
Implementation 1: Pure NumPy
Again assuming arr
is an NxN
matrix.
def fun(index, n, arr):
arr[index:-1, index:-1] = arr[index + 1:, index + 1:]
arr[0, index:-1] = arr[0, index + 1:]
arr[:, n-1:] = 0
return arr
Implementation 2: Cython
Bear with me as this is my very first time implementing Cython.
@cython.boundscheck(False)
def remove_from_collision_array(int index, int n, double[:,:] arr):
cdef int i, j, x_shape, y_shape
x_shape = arr.shape[0]
for i in range(index, x_shape):
for j in range(index, x_shape):
if j <= i:
# We are below the diagonal, do nothing
continue
elif i >= n-1 or j >= n-1:
arr[i, j] = 0
else:
arr[i, j] = arr[i+1, j+1]
arr[0, index:-1] = arr[0, index+1:]
arr[:, n-1:] = 0
return np.asarray(arr)
Discussion
Before anybody gets upset, yes I don't know what I'm doing in Cython. I disabled bounds_checking
because it really really speeds things up. And I'm performing a bounds check in the loop with one of my elif
statements.
I initially thought there would be no way that performing this operation in a loop would be faster than NumPy. I pre-allocate a NumPy array of size 5000x5000
to avoid needing to append, etc on the fly. I even tested the Cython implementation using the same 3 lines as the Numpy one, but it also performs poorly.
You can see that using index=0
will require the most computation. So I use that as a benchmark. While testing this in a loop, I've found that the Cython implementation is about 50%+ faster than the Numpy version. Perhaps this is because I am not adequately using the tools NumPy has to offer?
I am by no means a computer scientist, nor do I know if this is the best route. I'm a designer prototyping a system. If anybody has any insight on how to make this scream even faster, please let me know!
Update on the answer
Thanks to Jerome for teaching me something today! This will be instrumental in making this package run at lightning speed. I've added his insights to my code, resulting in a massive performance boost for two reasons that I can see:
- I've cut the number of loop iterations by
n*(n-1)/2
by starting thej
-loop above the diagonal. - I've removed all conditional statements.
Here is the updated Cython:
@cython.boundscheck(False)
@cython.wraparound(False)
def remove_from_collision_arrayV2(int index, int n, double[:,:] arr):
cdef int i, j
# Shift the diagonal matrix
for i in range(index, n-1):
for j in range(i, n-1):
arr[i, j] = arr[i+1, j+1]
# Shift the rop row
for j in range(index, n-1):
arr[0, j] = arr[0, j+1]
# Set Column column n-1 to zero
for i in range(n):
arr[i, n-1] = 0
return np.asarray(arr)
For benchmarking purposes. Performing this iteration 500 times using index=0
on a 500x500
matrix:
Original NumPy Code: 52.8s
Original Cython Code: 16.47s
- 3.2x Speedup
Updated Cython Code: 0.014s
- 3550x Speedup