Fortran FFTs in parallel using openmp

Question

I would like to speed up the Jacobian calculation in subroutine Advection by using omp instructions. My unsuccesful attempt is shown below. The four calculations labeled section 1 through 4 are independent of each other. Note that I am not trying to use openmp to speed up each fftw call. The program compiles, but if nx and ny are > 128, the program throws 'Segmentation fault (core dumped)'. By my counts the memory required is 32*8*nx*ny (in bytes), and the processor has 16 Gbytes.

I have wrapped Advection in a test program to add clarity to the question which is: How to structure a program into separate sections that can run concurrently?

!! to compile: gfortran -fopenmp -I/usr/include -o Omp_Advection Omp_Advection.f90 -lfftw3

Module FFTW3
  use, intrinsic :: iso_c_binding
  include 'fftw3.f03'
End Module FFTW3

Module NS
  use FFTW3
  ! constants
  real (kind=c_double), parameter :: pi=3.1415926535897932d0
  complex (kind=c_double_complex) :: i=complex(0.d0,1.d0)
  ! Variables and their FT
  integer (kind=c_int), parameter :: nx=128,ny=128 ! even!
  real (kind=c_double) :: omega(ny,nx),psi(ny,nx)
  complex (kind=c_double_complex) :: fftomega(ny,nx),fftpsi(ny,nx)
  complex (kind=c_double_complex) :: fftadvect(ny,nx)
  ! Operators
  complex (kind=c_double_complex) :: Dx(nx,ny),Dy(ny,nx)
  real (kind=c_double) :: laplacian(ny,nx),ilaplacian(ny,nx)
  ! needed for FFTW
  type (c_ptr) :: fwd,bak
  real (kind=c_double) :: norml=1.d0/dfloat(nx*ny)
  contains

    Subroutine Make_Data
      implicit none
      integer(kind=c_int) :: kx,ky
      real (kind=c_double) :: lcdx(nx),d2x(nx),ks(nx/2+1),harvest(ny,nx)
      real (kind=c_double),allocatable :: t(:),x(:),offset(:)
      complex (kind=c_double_complex),allocatable :: work(:,:)
      ! keep the factor i out until Dx and Dy
      ks=(/(dfloat(kx-1),kx=1,nx/2+1)/)
      t=(/(ks(kx),kx=nx/2,2,-1)/)
      lcdx=(/ks,-t/)           
      d2x=-lcdx*lcdx
      Dx=i*spread(lcdx,1,nx)
      Dy=i*spread(lcdx,2,ny)
      laplacian=spread(d2x,2,nx)+spread(d2x,1,ny)
      ilaplacian=1./laplacian
      ilaplacian(1,1)=0.d0
      ! make omega and compute psi, x and y are equal
      x=2.0d0*pi*(/(dfloat(kx-1)/(dfloat(nx)),kx=1,nx+1)/)-pi
      x=x(1:nx)                 ! this is y as well
      do kx=1,nx
         offset=-pi/3.0+2.d-2*cos(12.d0*x)         
         omega(kx,:)=1.d0/dcosh(30.d0*(x(kx)-offset))**2&
              &-1.d0/dcosh(30.d0*(x(kx)+offset))**2
      end do
      work=omega+i*0.d0
      call dfftw_execute_dft(fwd,work,fftomega)
      fftpsi=fftomega*ilaplacian
      call dfftw_execute_dft(bak,fftpsi,work)
      psi=norml*real(work)
    End Subroutine Make_Data

    Subroutine Advection (ftpsi,ftome)
      implicit none
      real(c_double) :: Dx_psi(ny,nx),Dy_psi(ny,nx)
      real(c_double) :: Dx_omega(ny,nx),Dy_omega(ny,nx)
      complex (kind=c_double_complex),intent(in) :: ftome(:,:),ftpsi(:,:)
      complex (kind=c_double_complex) :: D1(ny,nx),D2(ny,nx)
      complex (kind=c_double_complex) :: D3(ny,nx),D4(ny,nx)
      complex (kind=c_double_complex) :: D5(ny,nx),D6(ny,nx)
      complex (kind=c_double_complex) :: D7(ny,nx),D8(ny,nx)
!$omp parallel
!$omp sections
!$omp section
      ! section 1
      D1=Dx*ftpsi
      call dfftw_execute_dft(bak,D1,D2)
      Dx_psi=norml*real(D2)
!$omp section
      ! section 2
      D3=Dy*ftpsi
      call dfftw_execute_dft(bak,D3,D4)
      Dy_psi=norml*real(D4)
!$omp section
      ! section 3
      D5=Dx*ftome
      call dfftw_execute_dft(bak,D5,D6)
      Dx_omega=norml*real(D6)
!$omp section
      ! section 4
      D7=Dy*ftome
      call dfftw_execute_dft(bak,D7,D8)
      Dy_omega=norml*real(D8)
      ! end sections
!$omp end sections
!$omp workshare
      D1 = Dy_psi*Dx_omega - Dx_psi*Dy_omega+0.d0*complex(0.d0,0.d0);
!$omp end workshare
!$omp end parallel
      call dfftw_execute_dft(fwd,D1,fftadvect)

    End Subroutine Advection

End Module NS

Program Main
  use FFTW3
  use NS
  integer :: kx
  ! Prep FFTW
  call dfftw_plan_dft_2d(fwd,ny,nx,fftomega,fftpsi,FFTW_FORWARD,FFTW_MEASURE)
  call dfftw_plan_dft_2d(bak,ny,nx,fftomega,fftpsi,FFTW_BACKWARD,FFTW_MEASURE) 
  call Make_Data
  do kx=1,100
     call Advection(fftpsi,fftomega)
     end do

End Program Main

The crash is due to a stack overflow of your large arrays `nx*ny`. I suggest to make them allocatable. You can also try enlarging the stack as shown in some of the links. — Vladimir F Героям слава, Jun 29 '17 at 21:01
@VladimirF Thanks again. Could you explain what the difference, in the openmp environment, between declaring arrays and allocating as you suggested? What is the stack in this context and how does it differ from the "general use" memory. You correctly marked this as duplicate, so I suppose there will be no way to continue.. — Clinton Winant, Jun 30 '17 at 07:28
There is no way I can explain heap and stack in a comment. There are many questions and answers about that here. It is a topic to study and the connection with OpenMP is a trchnical implementation detail. The message is that allocatable array geta allocated on the heap which is much larger. — Vladimir F Героям слава, Jun 30 '17 at 08:49
@VladimirF, OK, I found that allocating the arrays rather than declaring them only allowed me to increase the size from 128^2 to 512^2. For 1024^2 on up I get the same segmentation fault. Clearly I have more reading to do. — Clinton Winant, Jun 30 '17 at 09:10
There is also `harvest`, make sure you find all relevant arrays. You may also try enlarging the stack, that is shown in the referenced links using `ulimit`. — Vladimir F Героям слава, Jun 30 '17 at 09:25

Fortran FFTs in parallel using openmp

0 Answers0