I would like to do addition and multiplication on top of array transposes.
Given A
is an array.
sum += p(A) * factor(p)
where p
is a permutation/transpose, factor(p)
is a list of prefactors. For example, A
is a 2 dimensional array, the target is
0.1*A + 0.1*transpose(A,(1,0))
I found in my Python
code, when the higher dimensional permutations are applied, the time for array addition is far larger than transpose. Perhaps numpy.transpose
uses pointer in C
. I want to know, is there any method to optimize the timing for the array addition part? numpy.add
does not help much. Should I somehow, only sum the part of transpose which affects the array, the rest part use multiplication? For example, a permutation (0,1,3,2)
, the (0,1)
part multiplies a common factor on top of the previous array. Or use cpython
to improve the performance?
Here is my Python
code
import numpy as np
import time
import itertools as it
ref_list = [0, 1, 2, 3, 4]
p = it.permutations(ref_list)
transpose_list = tuple(p)
print(type(transpose_list),type(transpose_list[0]),transpose_list[0])
n_loop = 2
na = nb = nc = nd = ne = 20
A = np.random.random((na,nb,nc,nd,ne))
sum_A = np.zeros((na,nb,nc,nd,ne))
factor_list = [i*0.1 for i in range(120)]
time_transpose = 0
time_add = 0
time_multiply = 0
for n in range(n_loop):
for m, t in enumerate(transpose_list):
start = time.time()
B = np.transpose(A, transpose_list[m] )
finish = time.time()
time_transpose += finish - start
start = time.time()
B_p = B * factor_list[m]
finish = time.time()
time_multiply += finish - start
start = time.time()
sum_A += B_p
finish = time.time()
time_add += finish - start
print(time_transpose, time_multiply, time_add, time_multiply/time_transpose, time_add/time_transpose)
The output is
0.004961967468261719 1.1218750476837158 3.7830252647399902 226.09480107630213 762.404285988852
The addition time is about ~700 folds larger than the transpose.
I tried to use numba's transpose in How to avoid huge overhead of single-threaded NumPy's transpose?
by adding
import numba as nb
@nb.njit('void(float64[:,::1], float64[:,::1])', parallel=True)
def transpose(mat, out):
blockSize, tileSize = 256, 32 # To be tuned
n, m = mat.shape
assert blockSize % tileSize == 0
for tmp in nb.prange((m+blockSize-1)//blockSize):
i = tmp * blockSize
for j in range(0, n, blockSize):
tiMin, tiMax = i, min(i+blockSize, m)
tjMin, tjMax = j, min(j+blockSize, n)
for ti in range(tiMin, tiMax, tileSize):
for tj in range(tjMin, tjMax, tileSize):
out[ti:ti+tileSize, tj:tj+tileSize] = mat[tj:tj+tileSize, ti:ti+tileSize].T
and use
B = transpose(A, transpose_list[m] )
received
Traceback (most recent call last):
File "transpose_test_v2.py", line 46, in <module>
B = transpose(A, transpose_list[m] )
File "/home/.../lib/python3.8/site-packages/numba/core/dispatcher.py", line 717, in _explain_matching_error
raise TypeError(msg)
TypeError: No matching definition for argument type(s) array(float64, 6d, C), UniTuple(int64 x 6)
or using
B = nb.transpose(A, transpose_list[m] )
and got
B = nb.transpose(A, transpose_list[m] )
AttributeError: 'int' object has no attribute 'transpose'