0

I am writing my first MPI code in Fortran solving 1D wave equation using Euler time integration and CD2 for spatial discretization. I have reviewed my code several times and I am not able to find any issue related to memory but the error message is pointing me to some memory issue. But I am have segmentation fault error in my runtime message-

[ss1986@gadi-login-04 ss1986]$ mpirun -n 8 a.out 

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f451efc4171 in ???
#1  0x7f451efc3313 in ???
#2  0x7f451e21c3ff in ???
#3  0x402899 in ???
#4  0x40313f in ???
#5  0x7f451e208492 in ???
#6  0x400fad in ???
#7  0xffffffffffffffff in ???
#0  0x7f05df4a7171 in ???
#1  0x7f05df4a6313 in ???
#2  0x7f05de6ff3ff in ???
#3  0x402899 in ???
#4  0x40313f in ???
#5  0x7f05de6eb492 in ???
#6  0x400fad in ???
#7  0xffffffffffffffff in ???
#0  0x7f92acbf2171 in ???
#1  0x7f92acbf1313 in ???
#2  0x7f92abe4a3ff in ???
#3  0x402899 in ???
#4  0x40313f in ???
#5  0x7f92abe36492 in ???
#6  0x400fad in ???
#7  0xffffffffffffffff in ???
#0  0x7f913e3c7171 in ???
#1  0x7f913e3c6313 in ???
#2  0x7f913d61f3ff in ???
#3  0x402899 in ???
#4  0x40313f in ???
#5  0x7f913d60b492 in ???
#6  0x400fad in ???
#7  0xffffffffffffffff in ???
#0  0x7f2a53f7e171 in ???
#1  0x7f2a53f7d313 in ???
#2  0x7f2a531d63ff in ???
#3  0x402899 in ???
#4  0x40313f in ???
#5  0x7f2a531c2492 in ???
#6  0x400fad in ???
#7  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 7 with PID 0 on node gadi-login-04 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[ss1986@gadi-login-04 ss1986]$ 

Here is my Fortran code-

program conv_parallel

implicit none
include "mpif.h"
integer :: i,j,k,cnt
integer, parameter :: N=800
double precision, dimension(:), allocatable :: x,u_in
double precision, dimension(:), allocatable :: grid
double precision, dimension(:), allocatable :: Transition_wave
double precision, dimension(:), allocatable :: Exact_sol
double precision, dimension(:), allocatable :: Initial_wave
double precision, dimension(:), allocatable :: u,u_ex,temp
double precision :: dx,dt,time
double precision :: left_value, right_value
double precision, parameter :: c=0.4d0, kh=0.5d0,alp=8.0d0
double precision, parameter :: xmax =5.0d0, tmax=50.0d0
integer :: ierr, n_ranks, my_rank, status
integer :: partition
character*11 :: filestr

dx = xmax/real(N-1)

dt = 0.005d0

call MPI_Init(ierr)
call MPI_Comm_rank(MPI_COMM_WORLD, my_rank,ierr)
call MPI_Comm_size(MPI_COMM_WORLD,n_ranks,ierr)

partition = N/n_ranks

! allocate the memory
if (my_rank == 0) then
    allocate (grid(1:N))
    allocate (Transition_wave(1:N))
    allocate (Exact_sol(1:N))
    allocate (Initial_wave(1:N))
end if

allocate (x(1:partition))
allocate (u_in(1:partition))
allocate (u(1:partition))
allocate (temp(1:partition))


! Domian grid calculation
do i = 1,partition
    x(i) = ((my_rank*partition)+i-1) * dx
enddo



! Initial Condition-Gaussian
do i = 1,partition
    u_in(i)=dexp(-alp*(x(i)-1.0d0)**2)*dsin(kh*(x(i))/dx)
end do

! use gather library to send x array to rank zero so that it can write it to a file
call MPI_Gather( x, partition, MPI_DOUBLE_PRECISION, grid, partition, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr )

! use gather library to send Initial wave to rank zero
call MPI_Gather( u_in, partition, MPI_DOUBLE_PRECISION, Initial_wave, partition, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr )
   

! writing data to the file and it is done by rank zero explicitly
if (my_rank==0) then
    open(2,file='initial.dat')
    write(2,*) 'Variables = grid, Initial_wave'
    do i = 1,N
    write(2,*) grid(i), Initial_wave(i)
    enddo
    close(2)
end if


cnt = 0
time = 0.0d0
temp = u_in

do  

! memory sharing between different ranks 

! The temp field has been changed, communicate it to neighbours
! With blocking communication, half the ranks should send first
! and the other half should receive first

if (my_rank == 0) then
    call MPI_Recv( right_value, 1, MPI_DOUBLE_PRECISION, my_rank+1, 1, MPI_COMM_WORLD, status, ierr)
    call MPI_Send( temp(partition), 1, MPI_DOUBLE_PRECISION, my_rank+1, 2, MPI_COMM_WORLD, ierr)
end if


if (mod(my_rank,2) == 1) then
! Ranks with odd number send first
  
    ! Send data down from my_rank to my_rank-1
    call MPI_Send( temp(1), 1, MPI_DOUBLE_PRECISION, my_rank-1, 1, MPI_COMM_WORLD, ierr)
    ! Receive dat from my_rank-1
    call MPI_Recv( left_value, 1, MPI_DOUBLE_PRECISION, my_rank-1, 2, MPI_COMM_WORLD, status, ierr)
    
    
    if (my_rank < (n_ranks-1)) then
     ! send right boundary data to my_rank+1
         call MPI_Send( temp(partition),1, MPI_DOUBLE_PRECISION, my_rank+1, 1, MPI_COMM_WORLD, ierr)
         ! receive left boundary data from my_rank+1
     call MPI_Recv( right_value, 1, MPI_DOUBLE_PRECISION, my_rank+1, 2, MPI_COMM_WORLD, status, ierr)
    endif
  
else
! Ranks with even number receive first
  
    if (my_rank > 0 .and. my_rank < n_ranks-1) then
         call MPI_Recv( right_value, 1, MPI_DOUBLE_PRECISION, my_rank+1, 1, MPI_COMM_WORLD, status, ierr)
         call MPI_Send( temp(partition), 1, MPI_DOUBLE_PRECISION, my_rank+1, 2, MPI_COMM_WORLD, ierr)
         call MPI_Recv( left_value, 1, MPI_DOUBLE_PRECISION, my_rank-1, 1, MPI_COMM_WORLD, status, ierr)
         call MPI_Send( temp(1), 1, MPI_DOUBLE_PRECISION, my_rank-1, 2, MPI_COMM_WORLD, ierr)
    endif
  
endif



! Descretization using FD and BD at Boundary
if (my_rank ==0) then
     u(1) = temp(1) - c*dt*(temp(2)-temp(1))/dx
     u(partition) = temp(partition) - 0.5d0*c*dt*(right_value-temp(partition-1))/dx
end if

if (my_rank == n_ranks) then
    u(partition) = temp(partition) - c*dt*(temp(Partition)-temp(partition-1))/dx
    u(1) = temp(1) - 0.5d0*c*dt*(temp(2)-left_value)/dx
end if

! calculation of wave in internal nodes
do i = 2, partition-1
    u(i) = temp(i) - 0.5d0*c*dt*(temp(i+1)-temp(i-1))/dx
enddo

if (my_rank > 0 .and. my_rank < n_ranks) then
    u(1) = temp(1) - 0.5d0*c*dt*(temp(2)-left_value)/dx
    u(partition) = temp(partition) - 0.5d0*c*dt*(right_value-temp(partition-1))/dx
end if

! exact solution
do j = 1,partition
    u_ex(j) = dexp(-alp*((x(j)-c*time)-1.0d0)**2)*dsin(kh*(x(j)-c*time)/dx)
end do


! use gather library to send message to rank zero

call MPI_Gather( u, partition, MPI_DOUBLE_PRECISION, Transition_wave, partition, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr )
call MPI_Gather( u_ex, partition, MPI_DOUBLE_PRECISION, Exact_sol, partition, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, ierr )

if (my_rank==0) then

filestr(1:1) = 't'
filestr(2:2) = ACHAR(48+int(time/100000))
filestr(3:3) = ACHAR(48+int(time/10000)-10*int(time/100000))
filestr(4:4) = ACHAR(48+int(time/1000)-10*int(time/10000))
filestr(5:5) = ACHAR(48+int(time/100)-10*int(time/1000))
filestr(6:6) = ACHAR(48+int(time/10)-10*int(time/100))
filestr(7:7) = ACHAR(48+int(time/1)-10*int(time/100))
filestr(8:11) = '.dat'

open(1,file = filestr, status = 'replace')
write(1,*) 'Variable = x, u_num, u_ex'

do k = 1,N
write (1,*) grid(k), Transition_wave(k), Exact_sol(k)
enddo

close(1)

end if

! call  MPI_Barrier(MPI_COMM_WORLD)

cnt = cnt+1
time = time + dt 
temp = u

if (time>tmax) exit

end do


! Free memory and finalise
deallocate(grid)
deallocate(Initial_wave)
deallocate(Transition_wave)
deallocate(Exact_sol)
deallocate(x)
deallocate(u_in)
deallocate(u)

! Call MPI_Finalize at the end
call MPI_Finalize(ierr)

end program

Can anyone please tell why this is happening and how to solve it

@VladimirF I did as you suggested to change from include mpif.h to use mpi but I am getting these un-expected errors-

[ss1986@gadi-login-04 ss1986]$ mpif90 -Wall -Wextra -std=f2008 -fcheck=all -O -g conv_parallel2.f90 
conv_parallel2.f90:19:12:

 character*11 :: filestr
            1
Warning: Obsolescent feature: Old-style character length at (1)
conv_parallel2.f90:106:1:

   call MPI_Recv( right_value, 1, MPI_DOUBLE_PRECISION, my_rank+1, 2, MPI_COMM_WORLD, status, ierr)
 1
Warning: Nonconforming tab character at (1) [-Wtabs]
conv_parallel2.f90:88:100:

     call MPI_Recv( right_value, 1, MPI_DOUBLE_PRECISION, my_rank+1, 1, MPI_COMM_WORLD, status, ierr)
                                                                                                    1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)
conv_parallel2.f90:99:99:

     call MPI_Recv( left_value, 1, MPI_DOUBLE_PRECISION, my_rank-1, 2, MPI_COMM_WORLD, status, ierr)
                                                                                                   1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)
conv_parallel2.f90:106:98:

   call MPI_Recv( right_value, 1, MPI_DOUBLE_PRECISION, my_rank+1, 2, MPI_COMM_WORLD, status, ierr)
                                                                                                  1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)
conv_parallel2.f90:113:105:

          call MPI_Recv( right_value, 1, MPI_DOUBLE_PRECISION, my_rank+1, 1, MPI_COMM_WORLD, status, ierr)
                                                                                                         1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)
conv_parallel2.f90:115:104:

          call MPI_Recv( left_value, 1, MPI_DOUBLE_PRECISION, my_rank-1, 1, MPI_COMM_WORLD, status, ierr)
                                                                                                        1
Error: There is no specific subroutine for the generic ‘mpi_recv’ at (1)
  • 1
    1. compile with "-g" so that you get a decent stack trace. 2. The even processors need two tests, separately for ">0" and " – Victor Eijkhout Jul 11 '21 at 00:19
  • 1
    Welcome, please take the [tour]. Do not use `include "mpif.h"` but `use mpi`, that will help you to find certain errors. Use tag [tag:fortran] for Fortran questions. – Vladimir F Героям слава Jul 11 '21 at 06:39
  • Need to search but this is a duplicate - even though grid is not being touched by procs 1-7 in MPI_Gather it needs to be allocated, you can't pass an unallocated allocatable array to a routine except in special circumstances. Also not dexp and dsin shouldn't have been used in almost 50 years. Use exp and sin. – Ian Bush Jul 11 '21 at 08:06
  • Fortran kinds would also be a good thing to learn about, as would MPI_Sendrecv – Ian Bush Jul 11 '21 at 08:09
  • Also you are not allocating u_ex before you use it. Learn about compiling with run time error checking turned on, this would have found all these for you. With gfortran I used `mpif90 -Wall -Wextra -std=f2008 -fcheck=all -O -g 1d.f90` – Ian Bush Jul 11 '21 at 08:14
  • Take a look at https://stackoverflow.com/questions/13496510/is-there-anything-wrong-with-passing-an-unallocated-array-to-a-routine-without-a/ , not an exact duplicate due to the unallocated array, but does address passing an unallocated array as an argument – Ian Bush Jul 11 '21 at 08:21
  • @VladimirF: actually `use mpi_f08`. The modern Fortran interface does a lot of type checking which is a Good Thing. It wouldn't have found this particular problem, but it's still a good idea. – Victor Eijkhout Jul 11 '21 at 15:58
  • @VictorEijkhout That is good but requires rewriting the code. And a recent compiler. – Vladimir F Героям слава Jul 11 '21 at 19:46
  • 1
    As you can see, the `use mpi` did indeed uncover the error and issued a correct error message. Those are not "but" "unexpected errors". Those are the helpful error messages that prevent you from compiling incorrect code. See the duplicate link for more. – Vladimir F Героям слава Jul 12 '21 at 13:40

0 Answers0