1

I am working on a CFD code which involves time-marching (the code has a main time loop that should ideally run for a prescribed number of time steps)

When I run the code with gfortran 9.3.0 on Ubuntu 20.04.3 LTS (64 bit), I am always getting a Segmentation Fault error. I ran the same code multiple times with the same input and the strange thing is each time the code stops at a different time step due to a segmentation fault and the backtrace after running the code with gdb shows that the fault occurs at a different location in the code each time. I checked these locations manually and did not find any illegal memory access taking place. Here is the backtrace for one of the runs:

    Program received signal SIGSEGV, Segmentation fault.
    0x00005555555c457e in fuvw_get () at znb3.f90:2663
    2663    110   FU(I,J,K)=F(I,J,K)*XDW(I)+F(I-1,J,K)*XDE(I)
    (gdb) backtrace
    #0  0x00005555555c457e in fuvw_get () at znb3.f90:2663
    #1  0x00005555555ae579 in h_get () at znb3.f90:2439
    #2  0x000055555555c979 in solve () at znb3.f90:1749
    #3  0x00005555555d3cfe in MAIN__ () at znb3.f90:587

The following is what I am using to compile the code:

    gfortran -ffree-line-length-none -fno-align-commons -fdefault-real-8 -mcmodel=medium -g znb3.f90 -o nb3   

The original code znb3.f90 is about 8000 lines and there are two input files. For now, I am including the subroutine fuvw_get() mentioned in the backtrace above:

SUBROUTINE FUVW_GET
!***********************************************************************
      PARAMETER(ID=66,JD=134,KD=66)
      COMMON/IGRID/NI,NJ,NK,NIM,NJM,NKM,NIMM,NJMM,NKMM,I23,K11,KNK,NW,NWM
      COMMON/GRIDB/XP(ID),XU(ID),XD(ID),XC(ID),XDE(ID),XDW(ID),YP(JD),YV(JD),YD(JD),YC(JD),YDN(JD),YDS(JD)&
     &            ,ZP(KD),ZW(KD),ZD(KD),ZC(KD),ZDT(KD),ZDB(KD),RP(ID),RU(ID),VOL(ID,JD,KD),WP(JD),WV(JD),WD(JD),WC(JD),WDN(JD),WDS(JD)
      COMMON/LEVFN/F(ID,JD,KD),FU(ID,JD,KD),FV(ID,JD,KD),FW(ID,JD,KD),S(ID,JD,KD),SU(ID,JD,KD),SV(ID,JD,KD),SW(ID,JD,KD)&
     &            ,H(ID,JD,KD),HB(ID,JD,KD),RK(ID,JD,KD),BF(ID,JD,KD),FC(ID,JD,KD)
!***********************************************************************
      DO 110 K=K11,KNK
      DO 110 J=1,NJ
      DO 110 I=2,NI
110   FU(I,J,K)=F(I,J,K)*XDW(I)+F(I-1,J,K)*XDE(I)

      DO 120 K=K11,KNK
      DO 120 J=2,NJ
      DO 120 I=1,NI
120   FV(I,J,K)=F(I,J,K)*YDS(J)+F(I,J-1,K)*YDN(J)

      IF(I23.EQ.2) RETURN
      DO 130 K=2,NK
      DO 130 J=1,NJ
      DO 130 I=1,NI
130   FW(I,J,K)=F(I,J,K)*ZDB(K)+F(I,J,K-1)*ZDT(K)

      DO 140 K=2,NK
      DO 140 J=2,NJ
      DO 140 I=2,NI
140   FC(I,J,K)=(FW(I,J  ,K)*XDW(I)+FW(I-1,J  ,K)*XDE(I))*YDS(J)+(FW(I,J-1,K)*XDW(I)+FW(I-1,J-1,K)*XDE(I))*YDN(J)

      RETURN
      END

I thought it might be a memory issue since there are multiple arrays involved which are quite large (66x134x66 double precision) but then I was under the impression that -mcmodel=medium would take care of that.

Any help with what might be causing this issue and how to possibly resolve it would be greatly appreciated! I will be happy to provide any further information that might be needed.

AK_thermal
  • 11
  • 1
  • 1
    Compile your code with `-g -ffpe-trap=invalid`. When it drops core, you can use gdb to find the exact location of the problem. BTW, you should add `-O` to options, and it you don't need it, don't use `-fdefault-real-8`. It may not do what you think. – steve Nov 11 '21 at 00:11
  • @steve, thanks for your answer! I have added the -g -ffpe-trap=invalid -O options while compiling. Can i run it with ./nb3 or should I use gdb nb3. I am relatively new to debugging in gdb so i apologize if my questions are very basic. Also, the -fdefault-real-8 option was there in the compiling command in the shell script left by the previous user of the code and it is preferred for the calculations to be double precision. I will let you know what i get from the dumped core. – AK_thermal Nov 11 '21 at 00:46
  • 1
    You can do either. If you do `gdb ./nb3`, you'll get a gdb prompt. Simply type `run` and gdb will run the command. Once it dies, enter `backtrace`. This should tell you where the problem. If you run `./nb23`, it should die and generate a file named `nb3.core`. The gdb command is then `gdb ./nb3 nb3.core`. Just entry the backtrace command. Forget to mention gfortran also has a `-fcheck=all` option. This may find the issue faster if the its an array indexing issue. – steve Nov 11 '21 at 03:38
  • 1
    Does this answer your question? [What flags do you set for your GFORTRAN debugger/compiler to catch faulty code?](https://stackoverflow.com/questions/3676322/what-flags-do-you-set-for-your-gfortran-debugger-compiler-to-catch-faulty-code) – francescalus Nov 11 '21 at 10:30
  • @steve The code with `-g -ffpe-trap=invalid -O` is still running. I am waiting for it to die from a segmentation fault so that I can figure out the location of the error from the backtrace. Meanwhile I am thinking I will run an instance of the code with `-O0 -g -fcheck=all` as suggested in the link provided by @francescalus. I was wondering what is the difference between -O and -O0? – AK_thermal Nov 11 '21 at 18:34
  • -O0 turns off all optimizations. It essentially considers each line of code independent of all other lines. -O is equivalent to -O1, which turns on several safe optimizations. In threory, once you have the code debugged, you can/should use -O2. This turns on additional safe optimizations, which take the compile a bit more effort to apply. – steve Nov 12 '21 at 19:42

0 Answers0