2

Situation:

I'm filling a narray of shape (2N, 2N), where N is close to 8000, call it A, with values I get from a function by using nested for loops to call a function that takes as argument subarrays of shape (2,) from the last dimension of an array of shape (N,N,2), call it B.

This is obviously costly, and while I was unsuccessful in vectorizing this part of the code (any help in this direction is also very welcome), I know that B has many repeated subarrays in the last dimension. So, what I want is to find out the unique subarrays and where each occurs. Then the filling up of A would be sped up by iterating over each of these unique subarrays and filling up all of the positions where it occurs with the value returned by function that would have been calculated just once.

What I have done is the following, but it doesn't seem to be either the most straightforward way to proceed, or the most Numpy way to do it.

The code I have been using to fill the matrix is the following:

translat_avg_disloc_matrix = np.zeros([2*n, 2*n])

for i in range(n):
    for alpha in range(2):

        for j in range(n):
            for beta in range(2):

                    translat_avg_disloc_matrix[2*i+alpha,2*j+beta] = average_lat_pos(alpha, beta, b_matrix[i][j])

While I can find the unique subarrays by doing something like what is done here: Efficiently count the number of occurrences of unique subarrays in NumPy?), I have had problems finding the indices where each occur.

What I have tried is doing something like:

1) Calculate the norm of the subarrays in the last dimension of B by norm = (B*B).sum(axis=2) and calculate the norm of the subarrays in the last dimension of B-1

by norm_ = ((B-1)*(B-1)).sum(axis=2)

2) Reshaping these narrays for these two norms with norm.reshape((norm.size,1))

3) Creating tile matrices as tile_norm = np.tile(norm.T, (len(norm),1))

4) Then doing np.unique(np.non_zero(np.abs(tile_norm - norm)+np.abs(tile_norm_-norm_) == 0), axis=0), which gives us something like: array([[0, 0, 0, 4], [4, 4, 4, 0]]) where in each row zeros indicate that these indices correspond to the same (2,) vector in the B matrix.

In words, I'm finding (2,) arrays whose norms agree as is, and that also agree when 1 subtracted from them - two equations, two variables.

What I'm looking for is a way to find where each of the unique subarrays occur in B so that using some fancy indexing will allow me to fill the matrix with no repeated calls to the function average_lat_pos (repeated here means calling for the same (alpha, beta, (2,) array) ordered pair).

Fhoenix
  • 121
  • 3

2 Answers2

0

Vectoring is often better. a slight rearrangement of your function facilitate the job :

import numpy as np
def average_lat_pos(a,b,x,y):  # all arguments are scalars
    return a*x+2*b*y           # as example

n=1000    
B=np.random.rand(n,n,2)


def loops():
    A=np.empty((2*n,2*n))
    for i in range(n):
        for alpha in range(2):

          for j in range(n):
            for beta in range(2):

              A[2*i+alpha,2*j+beta] = average_lat_pos(alpha, beta, 
              B[i,j,0],B[i,j,1])
    return A

To vectorize, just transform loop levels in corresponding dimensions:

Z=average_lat_pos(np.arange(2).reshape(1,2,1,1),np.arange(2).reshape(1,1,1,2),
B[:,:,0].reshape(n,1,n,1),B[:,:,1].reshape(n,1,n,1)).reshape(2*n,2*n)

Tests :

np.allclose(loops(),Z)
Out[105]: True

%timeit loops()
3.73 s ± 9.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit average_lat_pos(np.arange(2).reshape(1,2,1,1),np.arange(2).reshape(1,1,1,2),
   B[:,:,0].reshape(n,1,n,1),B[:,:,1].reshape(n,1,n,1)).reshape(2*n,2*n)
38.7 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit np.unique(B,axis=2)
1.31 s ± 6.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

np.unique implies a O(n ln n) operation, which will kill any potential gain.

But yet better here, use numba, decorating the two functions :

@numba.njit
def average_lat_pos(a,b,x,y):
   ....

@numba.njit
def loops(B):
   ....

then

%timeit loops(B)
3.04 ms ± 34.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


it's now 1000x faster.
B. M.
  • 18,243
  • 2
  • 35
  • 54
0

One straightforward trick would be to store average_lat_pos arguments in a dictionary. So if the key is in the dictionary, return it's value without needing to compute average_lat_pos for duplicate arguments. Since the average_lat_pos function is not available, I use a dummy function as:

def average_lat_pos(alpha, betha, td):
    result = 0
    for i in range(1000):
        result += 1
    return result

and the final code:

import numpy as np

def average_lat_pos_1(alpha, betha, td):
    get_result = dictionary.get((alpha, betha, td[0], td[1]), 'N')
    if get_result == 'N':
        result = 0
        for i in range(1000):
            result += 1
        dictionary[(alpha, betha, td[0], td[1])] = result
        return result
    else:
        return get_result

def average_lat_pos_2(alpha, betha, td):
    result = 0
    for i in range(1000):
        result += 1
    return result


N = 500
B = np.random.rand(N*N*2) * 100
B = np.floor(B)
B = B.reshape(N,N,2)

dictionary = dict()
A = np.empty([2*N, 2*N])
for i in range(2*N):
    for j in range(2*N):
        A[i, j] = average_lat_pos_1(i%2, j%2, B[i//2, j//2])

For N = 500, the average_lat_pos_1 performs roughly X10 faster than average_lat_pos_2

Masoud
  • 1,270
  • 7
  • 12