I have a Fortran program that gives different results with -O0
and -O1
in 32bit systems. Tracking down the difference, I came up with the following test case (test.f90
):
program test
implicit none
character foo
real*8 :: Fact,Final,Zeta,rKappa,Rnxyz,Zeta2
read(5,*) rKappa
read(5,*) Zeta
backspace(5)
read(5,*) Zeta2
read(5,*) Rnxyz
Fact=rKappa/Sqrt(Zeta**3)
write(6,'(ES50.40)') Fact*Rnxyz
Fact=rKappa/Sqrt(Zeta2**3)
Final = Fact*Rnxyz
write(6,'(ES50.40)') Final
end program test
with this data
file:
4.1838698196228139E-013
20.148674000000000
-0.15444754236171612
The program should write exactly the same number. Note that Zeta2
is the same as Zeta
, since the same number is read again (this is to prevent the compiler realizing they are the same number and hiding the problem). The only difference is that first an operation is done "on the fly" when writing, and then the result is saved in a variable and the variable is printed.
Now I compile with gfortran 4.8.4 (Ubuntu 14.04 version) and run it:
$ gfortran -O0 -m32 test.f90 && ./a.out < data
-7.1447898573566615177997578153994664188136E-16
-7.1447898573566615177997578153994664188136E-16
$ gfortran -O1 -m32 test.f90 && ./a.out < data
-7.1447898573566615177997578153994664188136E-16
-7.1447898573566605317236262891347096541529E-16
So, with -O0
the numbers are identical, with -O1
they are not.
I tried checking the optimized code with -fdump-tree-optimized
:
final.10_53 = fact_44 * rnxyz.9_52;
D.1835 = final.10_53;
_gfortran_transfer_real_write (&dt_parm.5, &D.1835, 8);
[...]
final.10_63 = rnxyz.9_52 * fact_62;
final = final.10_63;
[...]
_gfortran_transfer_real_write (&dt_parm.6, &final, 8);
The only difference I see is that in one case the number printed is fact*rnxyz
, and in the other it is rnxyz*fact
. Can this change the result? From High Performance Mark's answer, I guess that might have to do with which variable goes to which register when. I also tried looking at the assembly output generated with -S
, but I can't say I understand it.
And then, without the -m32
flag (on a 64bit machine), the numbers are also identical...
Edit: The numbers are identical if I add -ffloat-store
or -mfpmath=sse -sse2
(see here, at the end). This makes sense, I guess, when I compile in an i686 machine, as the compiler would by default use 387 math. But when I compile in an x86-64 machine, with -m32
, it shouldn't be needed according to the documentation:
-mfpmath=sse [...]
For the i386 compiler, you must use
-march=cpu-type
,-msse
or-msse2
switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default.[...]
This is the default choice for the x86-64 compiler.
Maybe -m32
makes these "defaults" ineffective? However, running gfortran -Q --help=target
says mfpmath is 387 and msse2 is disabled...