2

I've been trying to optimize the construction of matrix C (see below) by using NumPy. How could my code be further optimized so as to make the building of matrix C faster?

Given the following matrixes:

Q:   array([[78.66  , 47.196 , 31.464 ],
           [40.3875, 24.2325, 16.155 ],
           [40.4775, 24.2865, 16.191 ],
           ...,
           [55.62  , 33.372 , 22.248 ],
           [76.7475, 46.0485, 30.699 ],
           [77.3325, 46.3995, 30.933 ]])

S:  [[[1,2,3],[],[],[1,...,1125]],
     [[],[1,...,200],[300,301][]],
     ...,
     [[1,1125],[],[12],[345,453]]]

gamma:   array([[0. , 1.4, 2.5, 3. , 3. ],
               [0. , 1.6, 3. , 3.7, 4. ],
               [0. , 1.8, 3.5, 4.4, 5. ]])

I have the following code to build the three dimensional matrix C

# # Matrix C_ijk 
C = np.zeros((n,o,p))
for i in range(n):
    for j in range(o):
        for k in range(p):
            for u in range(m-1):
                if np.isin(i,S[j][u]):
                    C[i,j,k] = Q[j,k] * gamma[k,u+1]

Edit: m,n,o and p are integers which define the dimensional length of the matrixes. These are 5, 1126, 797 and 3 in the full model respectively.

Q is size (o,p) ; S is size (o,m-1) ; gamma is size (p,m-1) ; C is size (n,o,p) ;

Here is a small example output:

>>> Q
array([[10., 10.],
       [20., 20.],
       [30., 30.],
       [30., 30.]])
>>> S
[[[0, 1], [], [], [2]], [[2], [0], [1], []], [[], [1], [0, 2], []], [[], [2], [], [0, 1]]]
>>> gamma
array([[0.   , 0.575, 1.2  , 1.75 , 2.   ],
       [0.   , 0.625, 1.4  , 2.25 , 3.   ]])
>>> C
array([[[ 5.75,  6.25],
        [24.  , 28.  ],
        [52.5 , 67.5 ],
        [60.  , 90.  ]],

       [[ 5.75,  6.25],
        [35.  , 45.  ],
        [36.  , 42.  ],
        [60.  , 90.  ]],

       [[20.  , 30.  ],
        [11.5 , 12.5 ],
        [52.5 , 67.5 ],
        [36.  , 42.  ]]])

As suggested by @AhmedMohamedAEK would the implementation of numba in the following way be correct?

from numba import njit, prange

@njit(parallel=True)
def matrix_C(Q,S,gamma,n,o,p,m):
    C = np.zeros((n,o,p))
    for i in prange(n):
        for j in prange(o):
            for u in prange(m-1):
                if np.isin(i,S[j][u]):
                    C[i,j,:] = Q[j,:] * gamma[:,u+1]
    return C
C = matrix_C(Q,S,gamma,n,o,p,m)
fpolloa
  • 99
  • 4
  • 13
  • 1
    yes there is some ways. what are n, o and p means? what's m? – Ulises Bussi Nov 28 '21 at 13:12
  • could you explain what are the values `n,o,p,m` and provide some example toy code with complete `Q`, `S` and `gama`? – Ulises Bussi Nov 28 '21 at 13:17
  • @UlisesBussi , I've added more info on the values `n,o,p,m` and an example. – fpolloa Nov 28 '21 at 13:27
  • make sure all the arguments like Q,gamma,n,o,p are passed as arguments to the function so it won't compile them as constants, ie: you can dynamically change these values in your code by changing the arguments. – Ahmed AEK Nov 28 '21 at 15:18
  • also remove instances of parallel and prange from the numba code, the overhead of multithreading will kill your execution time. – Ahmed AEK Nov 28 '21 at 15:19

3 Answers3

2

you can remove the loop in k since it's used by all of the arrays as follows:

# # Matrix C_ijk 
C = np.zeros((n,o,p))
for i in range(n):
    for j in range(o):
        for u in range(m-1):
            if np.isin(i,S[j][u]):
                C[i,j,:] = Q[j,:] * gamma[:,u+1]

however using nested loops in python is very discouraged, and looping should be moved to the C side using an external module, which can be created using cython or numba.

edit: for the numba implementation above, if the array is very huge (a few MBs of size) then you could use a parallel implementation as follows:

from numba import njit, prange

@njit(parallel=True)
def matrix_C(Q,S,gamma,n,o,p,m):
    C = np.zeros((n,o,p))
    for i in prange(n):
        for j in range(o):
            for k in range(p):
                for u in range(m-1):
                    if np.isin(i,S[j][u]):
                        C[i,j,k] = Q[j,k] * gamma[k,u+1]
    return C
C = matrix_C(Q,S,gamma,n,o,p,m)

however if the array is relatively smaller then i'd just remove parallel and prange and just use the following:

@njit()
def matrix_C(Q,S,gamma,n,o,p,m):
    C = np.zeros((n,o,p))
    for i in range(n):
        for j in range(o):
            for k in range(p):
                for u in range(m-1):
                    if np.isin(i,S[j][u]):
                        C[i,j,k] = Q[j,k] * gamma[k,u+1]
    return C
C = matrix_C(Q,S,gamma,n,o,p,m)

and remember the first time you call the function is when it will be compiled, so it will be a little slow, but further calls will be fast.

Ahmed AEK
  • 8,584
  • 2
  • 7
  • 23
  • Thanks a lot for your answer. Indeed I was looking into avoiding nested loops altogether by using numpy but I haven't found a way of doing so given the uneven shape and nature of matrix S. Any ideas on how to do so? – fpolloa Nov 28 '21 at 13:50
  • from the kind of operations done inside the loops, it's sort of hard to remove all of the loops altogether, however if loops are a must, make sure you don't do it in python, the linked modules (cython/numba) can move the looping towards the machine code side instead of python so you won't pay the price of looping in python, which is very slow. – Ahmed AEK Nov 28 '21 at 13:56
  • I have edited my answer included the numba implementation. Have I implemented correctly? – fpolloa Nov 28 '21 at 14:34
  • 1
    you sort of need to pass Q,gamma,n,o,p as arguments to it so it will work correctly. also note that the first time you call it it's going to be slow as it will be compiling it, but later calls will be very fast, also there's caching in numba so you probably just need to compile it only once in your life if it works. – Ahmed AEK Nov 28 '21 at 15:14
  • 1
    also remove all instances of parallel and prange and just use normal range, multithreading is wrong for small sized arrays, and should only be done (if ever) on one of the loops (the outermost one) – Ahmed AEK Nov 28 '21 at 15:19
  • 1
    Could you please edit your answer showcasing the code you are explaining? I find it hard to understand where and where not to put the prange. – fpolloa Nov 28 '21 at 15:20
2

What's particularly hurting you - aside from the Python loops - is that linear time np.isin lookup in your innermost loop. This is easy to remedy. We can create only those indices i, j, u where i is in S[j][u], so we do not need to search for them later on.

This is achieved by the following code, creating the index generator R. The expression is somewhat long, but not difficult to understand.

R = ((i, j, u)
     for j in range(o)
     for u in range(m - 1)
     for i in S[j][u]
     if i < n) 

This index generator simplifies the computation of C a lot:

for i, j, u in R:
    C[i, j] = Q[j] * gamma[:, u + 1]

Since a lot of the work is now avoided completely, this should be considerably faster than your initial implementation.

Full code:

def matrix_C(Q, S, gamma, n, o, p, m):
    C = np.zeros((n, o, p))
    R = ((i, j, u)
         for j in range(o)
         for u in range(m - 1)
         for i in S[j][u]
         if i < n) 

    for i, j, u in R:
        C[i, j] = Q[j] * gamma[:, u + 1]

    return C
Additional thoughts
  1. This implementation can be sped up further. Only the last/largest value of u is used to create C[i, j] - earlier entries due to lower values of u are overwritten. Can you think of a way in which you can determine this largest u immediately as you build R?

  2. It might be worthwhile to compute Q[j] * gamma[:, u + 1] ahead of time for each j and u, and only perform a lookup when setting the values of C[i, j].

Nelewout
  • 6,281
  • 3
  • 29
  • 39
1

you can use np.fromfunction

C = np.fromfunction(function=f, shape=(n, o, p))

where f is

def f(i, j, k):
    ...
    # your logic here
    ...
    return value

python for loops are slow (see here), so this is much more efficient ( at least for the outer 3 loops)

good luck

Jonathan
  • 486
  • 3
  • 12