I'm working on a simulation where the bottleneck is performing a large number of complex double precision matrix exponentials, and am finding that while Fortran (Expokit) works well for small matrices, for larger ones it performs worse than Matlab or Python.
I have included a model program showing similar behavior below, though it requires larger matrices to show the performance difference. Looking at the profiler and the source code it seems that Expokit spends most of its time calling zgemm(), and so my only thought is an issue with my BLAS installation. Otherwise I don't understand why Fortran would be performing worse than Matlab or Python. I would greatly appreciate any insight into improving the performance of the Fortran matrix exponential code.
Results for 10000 matrices (4x4, 8x8, 30x30, 60x60, 80x80):
Matlab (s): 0.91, 0.97, 2.36, 5.45, 8.69
Python (s): 2.59, 2.89, 9.70, 35.4, 72.7
Fortran, Expokit (s): 0.037, 0.12, 4.14, 30.6, 74.9
Fortran, Expokit, OpenMP with 8 cores (s): 0.0039, 0.016, 0.52, 3.87, 9.53
Fortran code:
subroutine expokit_test()
use omp_lib
use iso_fortran_env
implicit none
integer, parameter :: wp = selected_real_kind(15, 307), size=80
complex(wp), parameter :: i = (0, 1._wp)
integer :: count, a, b
real(wp) :: wtime
complex(wp) :: mat_exp(size, size), mat(size, size), val
val = 1E-8_wp
mat = 0._wp
do a = 1, size
do b = 1, size
mat(a, b) = a * b
end do
end do
call omp_set_num_threads(8)
wtime = omp_get_wtime()
!$omp parallel do default(private) &
!$omp& shared(mat, val)
do count = 1, int(1E4)
mat_exp = expm_complex(-i * mat * val)
end do
!$omp end parallel do
wtime = omp_get_wtime () - wtime
write(6, *) 'expm_complex', sngl(wtime)
end subroutine expokit_test
function expm_complex(A) result(B)
! Calculate matrix exponential of complex matrix A using Expokit
use iso_fortran_env
implicit none
integer, parameter :: wp = selected_real_kind(15, 307)
complex(wp), dimension(:, :), intent(in) :: A
complex(wp), dimension(size(A, 1), size(A, 2)) :: B
integer, parameter :: ideg = 2 ! Pade approximation, 6 is reccomended but 2 appears to be stable
complex(wp) :: t = 1._wp
complex(wp), dimension(4 * size(A, 1) * size(A, 2) + ideg + 1) :: wsp
integer, dimension(size(A, 1)) :: iwsp
integer :: iexp, ns, iflag, n
n = size(A, 1)
call ZGPADM(ideg, n, t, A, n, wsp, size(wsp, 1), iwsp, iexp, ns, iflag)
B = reshape(wsp(iexp : iexp + n * n - 1), [n, n])
end function expm_complex
Matlab code:
size = 80;
for a=1:size
for b=1:size
A(a, b) = 1E-9 + (a * b);
end
end
tic
for test=1:1E4
t=expm(-1i*A*1E-8);
end
toc
Python code:
size = 80
mat = np.ones((size, size))
for a in range(0, size):
for b in range(0, size):
mat[a, b] = ((a+1) * (b+1))
mat = mat + 1E-9
start = time.time()
for loop in range(0, int(1E4)):
test = la.expm(-1j * mat * 1E-8)
end = time.time() - start
print('time taken', end)