2

I am working on a very old, fortran 77 code called LOWTRAN. It basically is a simulation tool used to model atmospheric light propagation.
(if you wish to see the complete lowtran code you can check it out here though I dont think it will help in answering the question).

Unfortunately, as that code was originally made for punch-cards, it was adapted for modern input/output methods and that created a few nasty glitches.
Those glitches are of the easy to spot/hard to fix kind.
In order to fix one of them i had no other choice than to setup an IF statement, which contains a GOTO that goes outside the IF statement, somwere else in the code.

However, Sometimes, the GOTO itself causes a segmentation fault. It does not happend randomly, it rather depends on a few variables that seem unrelated to that IF statement.

I am compiling this project on two different machines and one does not segfault. Both use gfortran On the windows Machine (the one that does not segfault) i use gfortran 7.2.0 and on the Linux Machine (the one that has segfaults) i use gfortran 4.8.5

(i can't update the gfortran version on the linux machine as i dont have the required rights)

Note that Both compilers obviously raise a warning when i compile my fix:

Warning: Legacy Extension: Label at (1) is not in the same block as the GOTO statement at (2)

here is the fix

100
...
...
<Lots of code>
...
...
   if(ierror.eq.-1) then
       itype = 1
       ierror = 0
       go to 100               
   end if

J.M
  • 313
  • 1
  • 8
  • Welcome, you have to show a full [mcve], the error is probably somwhere else. A link to a full large code is not enough, the code for a question should be small enough and self-contained. Use tag [tag:fortran] for all Fortran questions. – Vladimir F Героям слава Aug 23 '19 at 08:33
  • Well, i can't. I gave you the full code that you could see how much lowtran is an inextricable mess. Almost all variables are global, the code is litterally only made of GOTO statements and almost all variables are global. i cant give you minimal code as it would still be incomplete (global variables etc...). My question is more fundamental. i am looking to What could cause a GOTO statement to cause a segfault so that maybe in the future i could fix that issue and why it only happens on a windows computer with a specific version of gfortran. – J.M Aug 23 '19 at 08:43
  • 2
    Of course you can, the point is that it is often hard work to prepare such an example. But it is simple necessary. Very often one finds the problem during the process of preparing such an isolated test. – Vladimir F Героям слава Aug 23 '19 at 08:46
  • 1
    @VladimirF A MRE is not necessary to ask the question "What would cause a segmentation fault on a GOTO instruction?" That is a perfectly good general question. And it is a good question, as a SEGV usually occurs when dereferencing a pointer or accessing an array (out of bounds). – Raedwald Aug 23 '19 at 08:57
  • I posted the "minimal" code example (minimal is here a joke). As you can see it is a very nasty code and belive me or not, due to the global variables, this uses like 70% of the 15 000 lines of code that fortran is made of. you will never be able to understand what happens by seeing that code unless you use a debugger (which i did) – J.M Aug 23 '19 at 09:05
  • 2
    Mandatory reading for Fortran programmers experiencing segmentation faults: https://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors. At this stage it is immaterial that the article is a few years old and that it concerns Intel Fortran most specifically. – High Performance Mark Aug 23 '19 at 09:15
  • 1
    @Raedwald The question mentions a segmentation fault. I have seen way too many questions that just ended up by a disapointed OP when we tell the obvious. No, it can't happen there, you have a problem elsewhere. It was NOT just a general question about segfaults and gotos, it was about a particular segfault in a particular code. – Vladimir F Героям слава Aug 23 '19 at 09:19
  • 2
    Oh, and I should have added to my previous comment: break out your debugger. If the code is too long and too complex to post in entirety, and too difficult to distil into a [mcve], it's time to roll up your sleeves and dig into the guts of the code. Or maybe you already did that and forgot to mention it ? – High Performance Mark Aug 23 '19 at 09:54
  • Yeah, i did it, otherwise i would never have been able to find where to put that goto in the first place :) (the debugger was unable to help in solving segfault though) – J.M Aug 23 '19 at 11:45

3 Answers3

1

Thanks to Raedwald, i was able to find what actually happend.

The compiler optimizations where "hiding" the real cause of the segmentation fault.

What actually happend is that there was a huge loop that was using the label 100 as a reference for it's end point. Sometimes, the GOTO on the label 100 caused the loop to iterate one more time leading to an acces violation in an array.

i solved the problem by defining a new label.

I would never have thought of disabling the compiler optimizations, that really helped.

J.M
  • 313
  • 1
  • 8
  • The firs steps I do usually are "-O0 -CB -warn all"... once it is all hunky dory, then -O2... – Holmz Aug 23 '19 at 22:11
0

The code that the computer runs is not your source code, but rather machine code. The compiler generates that machine code from your source code. The generation can be more or less direct, so one statement of your source code corresponds to a few contiguous machine code instructions. But it need not be direct. In particular, if the compiler provides optimizations, the correspondence between lines of your source code and the machine code instructions can break down. In that case, the line that a debugger reports as the location of the SEGV can be wrong.

The simple implementation of a GOTO statement is an unconditional jump machine code instruction, jumping to a valid code address. That simple implementation would never result in a SEGV. You might be tempted to blame your compiler for being buggy, but that would be a mistake. Compiler optimzation has probably confused things. You probably have a fault in an array access near that GOTO statement, or the code just after its destination (the statement labelled 100).

Try recompiling your program with optimizations turned off (typically with a command-line option like -O0) and rerunning your program. You should then see the SEGV reported at the line where there is an invalid array access.

Raedwald
  • 46,613
  • 43
  • 151
  • 237
0

In case of memory related bugs, it's always a struggle - there is no short path. I can imagine, that it is something similar to what you have down here - in sample. In most of the cases it's an error related to jumping over certain part of the code that is quite important.

      program main
      implicit none
c
      call hello
      end

      subroutine hello
      implicit none
      integer a, i
      integer, dimension(:), allocatable :: x
      allocate(x(100))

      goto 101

100   do i = 1, 100
        x(i) = i
      end do
      return

101   read(*,*) a
c
      write(*,*) a

      if (a.eq.-1) then
        deallocate(x)
        go to 100
      end if

      go to 100
c
      end

As for the debugger, I suggest to go with gdb (on Linux it should be there). It will be way more easy to find the issue this way.

When it comes to SIGSEGV, sometimes, this kind of issues are "triggered" by one, nasty, byte. Thus, being hard to nail. Also, remember that this kind of bugs are very often of "Heisenbug" type: https://en.wikipedia.org/wiki/Heisenbug

Update

The above code is a perfect illustration of @Raedwald's suggestion, regarding optimisation.

> gfortran -O0 -o main main.f
> ./main
-1
          -1

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2B7311C376F7
#1  0x2B7311C37D3E
#2  0x2B73126C926F
#3  0x400C1A in hello_
#4  0x400C95 in MAIN__ at main.f:?
Segmentation fault
> gfortran -O3 -o main main.f
> ./main
-1
          -1
>
Oo.oO
  • 12,464
  • 3
  • 23
  • 45