I have some board
numpy arrays like that:
array([[0, 0, 0, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 1, 0, 0]])
And I'm using the following code to find the sum of elements on each nth diagonal from -7 to 8 of the board (and the mirrored version of it).
n = 8
rate = [b.diagonal(i).sum()
for b in (board, board[::-1])
for i in range(-n+1, n)]
After some profiling, this operation is taking about 2/3 of overall running time and it seems to be because of 2 factors:
- The
.diagonal
method builds a new array instead of a view (looks like numpy 1.7 will have a new.diag
method to solve that) - The iteration is done in python inside the list comprehension
So, there are any methods to find these sums faster (possibly in the C layer of numpy)?
After some more tests, I could reduce 7.5x the total time by caching this operation... Maybe I was looking for the wrong bottleneck?
One more thing:
Just found the .trace
method that replaces the diagonal(i).sum()
thing and... There wasn't much improvement in performance (about 2 to 4%).
So the problem should be the comprehension. Any ideas?