0

I'm running a script that contains one loop to calculate two arrays (alpha_x and alpha_y) using input arrays (X,Y,M_list and zeta_list) from Python. They work fine when I run the Fortran script with no OpenMp. The array values are the same as if I did the calculation in Python alone. However, when I add in OpenMP support and make use of multiple cores, my array outputs for alpha_x and alpha_y are different values after each time I run the script! The code is below:

PROGRAM LensingTest1

IMPLICIT NONE 

DOUBLE PRECISION,DIMENSION(1,1000)::M_list
DOUBLE PRECISION,DIMENSION(1000,1000)::X,Y,z_m_z_x,z_m_z_y,dist_z_m_z, alpha_x, alpha_y
DOUBLE PRECISION,DIMENSION(2,1000)::zeta_list
REAL::start_time,stop_time
INTEGER::i,j


open(10,file='/home/Desktop/Coding/Fortran/Test_Programs/Lensing_Test/M_list.dat')
open(9,file='/home/Desktop/Coding/Fortran/Test_Programs/Lensing_Test/zeta_list.dat')
open(8,file='/home/Desktop/Coding/Fortran/Test_Programs/Lensing_Test/X.dat')
open(7,file='/home/Desktop/Coding/Fortran/Test_Programs/Lensing_Test/Y.dat')

read(10,*)M_list
read(9,*)zeta_list
read(8,*)X
read(7,*)Y


call cpu_time(start_time)

!$OMP PARALLEL DO 
do i=1,size(M_list,2),1 
    z_m_z_x = X - zeta_list(1,i)
    z_m_z_y = Y - zeta_list(2,i)
    dist_z_m_z = z_m_z_x**2 + z_m_z_y**2
    alpha_x = alpha_x + (M_list(1,i)* z_m_z_x / dist_z_m_z)
    alpha_y = alpha_y + (M_list(1,i)* z_m_z_y / dist_z_m_z)
end do 
!$OMP END PARALLEL DO 

call cpu_time(stop_time)

print *, "Setup time:", &
      stop_time - start_time, "seconds"


open(6,file='/home/Desktop/Coding/Fortran/Test_Programs/Lensing_Test/alpha_x.dat')
do i =1,1000
    write(6,'(1000F14.7)')(alpha_x(i,j), j=1,1000)
end do 

open(5,file='/home/Desktop/Coding/Fortran/Test_Programs/Lensing_Test/alpha_y.dat')
do i =1,1000
    write(5,'(1000F14.7)')(alpha_y(i,j), j=1,1000)
end do 


stop


END PROGRAM LensingTest1    

The only difference is that I add in the !$OMP PARALLEL DO and !$OMP END PARALLEL DO for the OpenMP support. I compile with gfortran -fopenmp script.f90 and then export OMP_NUM_THREADS=4

ThunderFlash
  • 412
  • 5
  • 20
  • 4
    Try adding `private( z_m_z_x, z_m_z_y, dist_z_m_z) reduction( +: alpha_x, alpha_y )` to your `parallel do` directive. – Gilles Sep 25 '17 at 09:02
  • Getting segmentation fault errors, although my stack size is set to unlimited. Will try compiling on ifort with the -heap-arrays flag but i'm not at the hpc right now – ThunderFlash Sep 25 '17 at 09:09
  • Gilles is right, you have a race condition. – Vladimir F Героям слава Sep 25 '17 at 09:10
  • 1
    BTW, opening files on unit `5` and `6` is also a big no-no. They are normally pre-connected to standard output and input. – Vladimir F Героям слава Sep 25 '17 at 09:10
  • Is there any way to deal with the seg error if I use gfortran to compile then? Do I have to specifically switch my static arrays into allocatable arrays? Didn't know about race conditions till you mentioned them...and right, will use higher unit numbers then – ThunderFlash Sep 25 '17 at 09:12
  • 2
    You will probably need to play with the `OMP_STACKSIZE` environment variable. But in any case, I doubt you'll see much performance improvement as your code is likely memory bound, not compute bound. – Gilles Sep 25 '17 at 09:13
  • 1
    Unlimited stack size only affects the main thread. With all those arrays private, you need a lot of stack for the OpenMP worker threads. See [here](https://stackoverflow.com/questions/13264274/why-segmentation-fault-is-happening-in-this-openmp-code/13266595#13266595). – Hristo Iliev Sep 25 '17 at 09:14
  • I would just use allocatables, it is much simpler than tweaking stack sizes and stuff. But first fix your bug. – Vladimir F Героям слава Sep 25 '17 at 09:15
  • @agentp when I solve the race condition bug, OpenMP seems to work to parallelise the loop, by which I mean the script ends significantly faster...what do you mean? – ThunderFlash Sep 25 '17 at 11:14
  • Also, after adding in what Gilles said, it seems to print out the same array as in Python, except for a few elements which are extremely large numbers( on the order of e252)....For example, In the python method I would get np.amax(alpha_x) = 0.2, meanwhile in fortran im getting maxval(alpha_x) = 2e+250...why is this happening? I do stress that 99% of the elements seem to be the correct values... – ThunderFlash Sep 25 '17 at 11:17
  • oh, never mind re: my prior comment..i see. – agentp Sep 25 '17 at 11:21
  • 1
    try initialising `alpha_x=0` before the loop. – agentp Sep 25 '17 at 11:38
  • works like a charm! what is the issue here? how can not initialising the array cause such an issue? – ThunderFlash Sep 25 '17 at 11:42
  • 2
    you should never assume an uniniialsed variable is zero. That would have bit you sooner or later with a different compiler even without openmp. – agentp Sep 25 '17 at 11:46
  • 1
    don't use cpu_time to measure the time when using OpenMP, use system_clock instead. – M. Chinoune Sep 25 '17 at 14:38
  • always wanted to get that fixed, thanks for the tip – ThunderFlash Sep 25 '17 at 16:27
  • Don't use `cpu_time` or `system_clock` to measure the time when using OpenMP, use `omp_get_wtime` instead. – Hristo Iliev Sep 26 '17 at 10:33

0 Answers0