0

I have my simple MPI fortran code as below. The code crashes with error forrtl: severe (174): SIGSEGV, segmentation fault occurred. I am not sure where the mistake is. The weird thing I notice here is it doesn't crash always. Sometimes for small n it works sometimes not. for some numberr of processor it works for some not. For the given example it is not working for any number of processors. The debugging in MPI is not that easy for me. Can anybody find what's wrong here?

  program crash
  use mpi
  implicit none
  integer,parameter::dp=kind(1.d0)
  integer, parameter :: M =1500,N=M,O=M! Matrix dimension
  integer myrank, numprocs,ierr, root
  integer i, j, k,l,p,local_n,sendcounts
  real(dp) R1(M),RHS(M),RHS1(M)
  real(dp),dimension(:),allocatable::local_A
  real(dp),dimension(:,:),allocatable::local_c,local_c1
  real(dp) summ,B(N,O),B1(N,O),C(M*O),C1(M)
  real(dp) final_product(M,O),rhs_product(O)
  integer,dimension(:),allocatable::displs!,displs1,displs2
  integer,dimension(:),allocatable::sendcounts_list
  real(dp),dimension(:,:),allocatable::local_A_Matrix
  integer status(MPI_STATUS_SIZE)
  integer request
  ! Initialize MPI
  call MPI_Init(ierr)
  call MPI_Comm_size(MPI_COMM_WORLD, numprocs, ierr)
  call MPI_Comm_rank(MPI_COMM_WORLD, myrank, ierr)

 B=0.d0
 do i=1,N
    do j=1,O
      B(i,j)=(i+1)*myrank+j*myrank+j*i
    enddo
 enddo


 R1=0.d0
   do i=1,N
 R1(i)=i*myrank+1
 enddo



if (myrank<numprocs-mod(M,numprocs)) then
    local_n=M/numprocs
 else
    local_n=M/numprocs+1
 endif


  sendcounts =  local_n * N
   allocate(sendcounts_list(numprocs))
   call MPI_AllGATHER(local_n, 1, MPI_INT, sendcounts_list, 1, MPI_INT,MPI_COMM_WORLD,IERR)



 if(myrank==0) then
     allocate(displs(numprocs))
       displs=0
      do i=2,numprocs
         displs(i) = displs(i-1)+N*sendcounts_list(i-1)
      enddo
  endif


  allocate(local_A(sendcounts))
  local_A=0.d0
  call MPI_Scatterv(Transpose(B),N*sendcounts_list,displs,MPI_Double,local_A,N*local_n, &
                    MPI_Double,0,MPI_COMM_WORLD,ierr)
  deallocate(sendcounts_list)           
  if(myrank==0) then
    deallocate(displs)
  endif  

   allocate(local_A_Matrix(local_n,N))
   local_A_Matrix=reshape(local_A,(/local_n,N/),order=(/2,1/))

   deallocate(local_A)





    call MPI_Finalize(ierr)

  end program crash
  • The first step is to test every `allocate()` does indeed allocate memory. – Gilles Gouaillardet Aug 16 '23 at 00:54
  • Thanks for suggestions Gilles. I double checked and it seems they are allocated properly. For small size it works the problem is when my size M increases. Do you think the memory are allocated incorrectly? any different ways you think of for allocating the memory? I am using an intel compiler and for some reason I was thinking the compiler issue and am lost in this simple problem. Thanks – researcher_sp Aug 16 '23 at 01:09
  • do you get a stack trace and a line number when the program crashes? – Gilles Gouaillardet Aug 16 '23 at 01:22
  • Do any of your counts go over 2.14 billion? – Victor Eijkhout Aug 16 '23 at 01:23
  • @Gilles no as it is a MPI and I don't have much idea in valgrind the MPI debugger people suggest. It compiles at first. While I run it crashes. I think it has something to do with memory as you think. it says 'forrtl: severe (174): SIGSEGV, segmentation fault occurred' and gives me bunch of mpi errors. The way I compile is :: crun.intel mpiifort recheck.f90 -o sp. It compiles and when i run :: srun -n 2 crun.intel ./sp that's when it gives me those errors – researcher_sp Aug 16 '23 at 01:27
  • @victor I didn't quite get how you got 2.14 billion? My matrix is 1500 by 1500. I used varying numbers of processors like 2, 10,15 and so on. The interesting thing is sometimes it works most of the time it does not. That is what bothering me – researcher_sp Aug 16 '23 at 01:30
  • @Gilles To add it more: it says it says:: tasks 0-1: Exited with exit code 174 if you get any insight from it. – researcher_sp Aug 16 '23 at 01:33
  • try `mpiifort -g -O0 -check bounds -traceback ...`, then `ulimit -c unlimited` and `mpirun ...`. you should at least get a core file, then attach it with `gdb` to get a line number. – Gilles Gouaillardet Aug 16 '23 at 01:38
  • @researcher_sp, what happens if you DON'T transpose B and you DON'T set local_A_Matrix ? – lastchance Aug 16 '23 at 09:11
  • @lastchance It gives me the same issue. I even tried by reshaping B matrix to 1d list which also doesnt work. Do you think this as a compiler issue? – researcher_sp Aug 16 '23 at 11:44
  • Note `MPI_INT` and `MPI_Double` are not part of the MPI binding for Fortran and should not be used, you want `MPI_Integer` and `MPI_Double_precision` – Ian Bush Aug 16 '23 at 11:45
  • You are compiling with all the debugging options, especially array bounds that checking, turned on? – Ian Bush Aug 16 '23 at 11:49
  • 1
    Displs is only allocated on rank 0, that could well be your problem. What happens if you allocate it to size zero on the other ranks? – Ian Bush Aug 16 '23 at 11:53
  • See https://stackoverflow.com/questions/13496510/is-there-anything-wrong-with-passing-an-unallocated-array-to-a-routine-without-a/13496808#13496808 for why displays should be allocated – Ian Bush Aug 16 '23 at 12:00
  • @researcher_sp, all I can say is that it ran (as is) with gfortran and Microsoft MPI, but failed at run-time with both ifort/Microsoft MPI and mpiifort. So I suspect some problem with the intel compiler. I tried changing to types MPI_Integer and MPI_Double_precision, making sure that all ranks allocated Displs, and using dynamic arrays instead of your current fixed-size ones in case there was a stack limitation. None worked. The only thing that worked was using B rather than Transpose(B) in the MPI_Scatterv call (or assigning another array to transpose(B)) and not setting local_A_Matrix. – lastchance Aug 16 '23 at 13:52
  • 1
    passing unallocated arrays is not valid Fortran. it might just work with some compilers, or crash with others. – Gilles Gouaillardet Aug 16 '23 at 14:49
  • @researcher_sp, what happens if you turn M down to 100? – lastchance Aug 16 '23 at 15:02
  • @lastchance thanks for your thoughts. For small M it works. It worked for M=100. For example if M=500 it works sometimes but it doesn't the other time. This is interesting. – researcher_sp Aug 16 '23 at 16:34
  • @IanBush I tried all the possible ways you explained it doesn't work. Like lastchance said I have not tried with gfortran I used intel compiler so will do that later and see. This is frustrating. Thanks for your suggestions – researcher_sp Aug 16 '23 at 16:35
  • @researcher_sp, francescalus solved the related problem in my cut-down version of your code by adding the compiler option -heap-arrays to force your large temporaries onto the heap. – lastchance Aug 16 '23 at 16:42
  • @Gilles I have allocated all the arrays. Did you see where I have passed the unallocated arrays. Thanks – researcher_sp Aug 16 '23 at 16:43
  • @lastchance It worked for this problem. Thanks for your help. I have no words to appreciate this community. I lost my one week only for this issue and maybe saved one month or more by finding this community. Thank you all – researcher_sp Aug 16 '23 at 16:52
  • 2.14 billion is two-to-the-31, the maximum number of elements. So I was asking if you were exceeding that. – Victor Eijkhout Aug 16 '23 at 17:11
  • @lastchance still it doesn't work with the heap if you change array size. it worked for this particular one when I wrote you but not anymore. Just wanted to let you know. If you find any other issues give me some insight. I am struggling to understand what is this issue actually is. – researcher_sp Aug 16 '23 at 23:25
  • `displs` is passed unallocated on non root rank. This is ignored by MPI but this is **not** valid Fortran. – Gilles Gouaillardet Aug 17 '23 at 01:43
  • @GillesGouaillardet in scatterv displs is only meaningful at the root. But anyway I allocated displs to all ranks but the issue is the same. Thanks for your suggestions – researcher_sp Aug 17 '23 at 18:21

2 Answers2

0

The code is working properly which is surprising to me. The idea @lastchance gave me to use -heap-arrays worked a bit. It was kind of working sometimes and no other time which was bothering me. Now the only thing I changed is allocated all the arrays no matter how small they are. This solved my problem. But still, I don't know why, and I don't care right now as I am math/physics researcher not computer scientist. Just in case anyone come across these kinds of issues please allocate all the arrays and see. Just came back to thank all of you. You guys are awesome.

-1

Your problem lies in the

call MPI_Scatterv(Transpose(B),N*sendcounts_list

line. Multplying a list is legal in Fortran, but it does not give the sort of thing that MPI expects in that local. Create a temporary array with that as contents, and then pass that.

Victor Eijkhout
  • 5,088
  • 2
  • 22
  • 23