I have a question concerning Numpy /Python and Fortran running speed. First, to start with, I reprogrammed a running Python program in Fortran. It works fine. But I realized that the Fortran program is losing more and more speed with larger array sizes than Numpy arrays.
Here are some numbers. For low step size Fortran(with Intel Fortran compiler) takes 0,2s, and Python takes 5 seconds. First, when I saw this, I was delighted. But then I reduced the step size, and the Fortran program took 770 s, while the python was 1450 s. That's a loss of almost 10 times. And I suppose if I reduce the step size further, Python will be faster again. That sucks.
And I look at almost all the steps. The Fortran arrays in loops are 10 times slower (with a ten times smaller step size), which is somehow logical. But numpy arrays are only 2-3 times slower.
Does anyone know what these numpy functions do, that they don't lose their speed linearly? Is there anything comparable to do in Fortran?
Here is a short example, but the whole code has more than 1000 columns, so nobody would ever read this. psi
is a complex array, r
is a real/double array with the length depending on dr. First, the Python code.
phi0= 4* pi * np.cumsum(np.cumsum(r * np.abs(chi)**2) * dr) * dr / r
phi0 += - phi0[-1] - N/r[-1]
with dr=0.1 it takes 0.00006s, dr=0.01 it takes 0.00008s, dr=0.001 it takes 0.0002s
Here ist the fortran code:
integer :: i,j,m
double precision :: sum2_j, pi=3.14159265359, N, dr, sum1_i
double precision, dimension (:), allocatable :: sum1_array, phi, step1, r
complex(8), dimension(:), allocatable :: psi
!double precision :: start, finish
m=size(psi)
allocate (phi(m))
allocate (sum1_array(m))
allocate (step1(m))
!call cpu_time(start)
sum1_i=0
step1=r*abs(psi)**2
do i=1,size(psi)
sum1_i=sum1_i+step1(i)
sum1_array(i)=sum1_i*dr
end do
sum2_j=0
do j=1,size(phi)
sum2_j=sum2_j + sum1_array(j)
phi(j)=4*pi*sum2_j*dr/r(j)
end do
phi=phi - phi(size(phi))-N/r(size(r))
The run times/with eclipse/photran(intel fortran about 2 times faster): dr=0.1: 0.0000008s, dr=0.01: 0.00006s, dr=0.001: 0.00045s
As you can see, Python is almost 10 times slower at low step size but even faster at larger step size. This issue concerns the two loops in the FORTRAN code. It is not specific to that code. It occurs in all loops. As I said, it is just an example. There's nothing I wouldn't try so far because I do not understand why this is happening.