Sorry for the possible duplication.
About the problem.
numpy (1.18.2) in python 3.8.2 gives me a very high simulation speed (3 times faster) for a matrix product compared to GNU Fortran (9.2.0 MinGW.org GCC Build-20200227-1) under Windows. I used the command gfortran.exe test.f
without any additional options.
Does anyone know what is causing this and is it possible to increase the simulation speed in Fortran?
Here is the fortran code:
program product_test
INTEGER :: N,N_count,i,j,k,nc
REAL*8 :: t1,t2
REAL*8,dimension (:,:), allocatable :: a,b,c
N = 200
N_count = 10
allocate ( a(N,N) )
allocate ( b(N,N) )
allocate ( c(N,N) )
call RANDOM_NUMBER(a)
call RANDOM_NUMBER(b)
print *, 'Matrix Multiplication: C = A * B for size (',N,',',N,')'
call CPU_TIME ( time_begin )
do nc=1,N_count
c = MATMUL(a,b)
end do
call CPU_TIME ( time_end )
t2 = (time_end - time_begin)/N_count
print *, 'Time of operation was ', t2, ' seconds'
end
Here is the output:
Matrix Multiplication: C = A * B for size ( 200 , 200 )
Time of operation was 9.3749E-003 seconds
Here is the python 3 code:
import numpy as np
import time
N = 200
N_count = 10
a = np.random.rand(N,N)
b = np.random.rand(N,N)
c = np.zeros([N,N], dtype = float)
print('Matrix product in python (using numpy): c= a*b for size (',N,',',N,')')
start_time = time.time()
for nc in range(N_count):
c = a@b
t2 = (time.time() - start_time)/N_count
print('Elapsed time = ',t2,'s')
Here is the output:
Matrix product in python (using numpy): c= a*b for size ( 200 , 200 )
Elapsed time = 0.0031252 s
**Additional tests.** following to the comments of "roygvib" and "Vladimir F", I have done the test with blas/lapack:
gfortran test.f -lopenblas -o test.exe
or gfortran test.f -ffast-math -o test.exe
or gfortran test.f -lblas -o test.exe
or gfortran test.f -llapack -o test.exe
give me the calculation time of 0.0063s for matrix multiplication of square matrices with the size ( 200 x 200 ).
Unfortunately, I have deleted the previous version of mingw and new tests were performed under GNU Fortran (x86_64-posix-seh-rev0, Built by MinGW-W64 project 8.1.0). May be I did something incorrect because there are no difference between -llapack
, -lblas
, -lopenblas
. For time measuring, I used SYSTEM_CLOCK
as suggested by "Vladimir F".
Now, it is better, but numpy still faster than fortran (not three times but two times). Following to the last comment of "Vladimir F", I found that unlike Python, Fortran uses mainly one logical core (there are 4 logical cores on my PC with intel i3 CPU). Thus, this is a problem of improperly configured MinGW on my PC (Windows8.1).