Situation:
I'm filling a narray of shape (2N, 2N), where N is close to 8000, call it A, with values I get from a function by using nested for loops to call a function that takes as argument subarrays of shape (2,) from the last dimension of an array of shape (N,N,2), call it B.
This is obviously costly, and while I was unsuccessful in vectorizing this part of the code (any help in this direction is also very welcome), I know that B has many repeated subarrays in the last dimension. So, what I want is to find out the unique subarrays and where each occurs. Then the filling up of A would be sped up by iterating over each of these unique subarrays and filling up all of the positions where it occurs with the value returned by function that would have been calculated just once.
What I have done is the following, but it doesn't seem to be either the most straightforward way to proceed, or the most Numpy way to do it.
The code I have been using to fill the matrix is the following:
translat_avg_disloc_matrix = np.zeros([2*n, 2*n])
for i in range(n):
for alpha in range(2):
for j in range(n):
for beta in range(2):
translat_avg_disloc_matrix[2*i+alpha,2*j+beta] = average_lat_pos(alpha, beta, b_matrix[i][j])
While I can find the unique subarrays by doing something like what is done here: Efficiently count the number of occurrences of unique subarrays in NumPy?), I have had problems finding the indices where each occur.
What I have tried is doing something like:
1) Calculate the norm of the subarrays in the last dimension of B by norm = (B*B).sum(axis=2)
and calculate the norm of the subarrays in the last dimension of B-1
by norm_ = ((B-1)*(B-1)).sum(axis=2)
2) Reshaping these narrays for these two norms with norm.reshape((norm.size,1))
3) Creating tile matrices as tile_norm = np.tile(norm.T, (len(norm),1))
4) Then doing np.unique(np.non_zero(np.abs(tile_norm - norm)+np.abs(tile_norm_-norm_) == 0), axis=0)
, which gives us something like: array([[0, 0, 0, 4], [4, 4, 4, 0]])
where in each row zeros indicate that these indices correspond to the same (2,) vector in the B matrix.
In words, I'm finding (2,) arrays whose norms agree as is, and that also agree when 1 subtracted from them - two equations, two variables.
What I'm looking for is a way to find where each of the unique subarrays occur in B so that using some fancy indexing will allow me to fill the matrix with no repeated calls to the function average_lat_pos
(repeated here means calling for the same (alpha, beta, (2,) array) ordered pair).