1

I'm debugging an ANSI C program run on 64-bit Linux CentOS 5.7 using gcc44 and gdb. I have the following loop in the program:

for (ii = 1; ii < 10001; ii++) {
    time_sec[ii] = ( 10326 ) * dt - UI0_offset;  /* in seconds */ 
    printf("\ntime_sec[%d] = %16.15e, dt = %16.15e, UI0_offset = %26.25e\n", 
           ii, time_sec[ii], dt, UI0_offset);
}

where time_sec, dt, and UI0_offset are doubles. The relevant gdb session is:

(gdb) p time_sec[1]
$2 = 2.9874137906250006e-15
(gdb) p ( 10326 ) * dt - UI0_offset
$3 = 2.9874137906120759e-15

Why are $2 and $3 different numbers? The $2=time_sec[1] is computed by the c program, whereas $3 is the same equation, but computed in gdb.

I'm porting a Matlab algorithm to C and Matlab (run on a different machine) matches the gdb number $3 exactly, and I need this precision. Anyone know what could be going on here, and how to resolve?

UPDATE: After some debugging, it seems the difference is in the value of UI0_offset. I probed gdb to reveal a few extra digits for this variable (note: anyone know a better way to see more digits in gdb? I tried an sprintf statement but couldn't get it to work):

(gdb) p UI0_offset -1e-10
$5 = 3.2570125862093849e-12

I then inserted the printf() code in the loop shown in the original posting above, and when it runs in gdb it shows:

time_sec[1] = 2.987413790625001e-15, dt = 1.000000000000000e-14, 
UI0_offset = 1.0325701258620937565691357e-10

Thus, to summarize:

1.032570125862093849e-10 (from gdb command line, the correct value)
1.0325701258620937565691357e-10 (from program's printf statement, NOT correct value)

Any theories why the value for UI0_offset is different between gdb command line and the program running in gdb (and, how to make the program agree with gdb command line)?

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
ggkmath
  • 4,188
  • 23
  • 72
  • 129
  • What optimization level are you using when you compile the app with gcc? – Steve Dec 09 '11 at 00:55
  • I'm not calling anything out, so whatever is default. Compile time duration is not an issue for me. Should I use -O2? – ggkmath Dec 09 '11 at 00:56
  • Can you show us a small complete program that exhibits the issue? For example, the results you're showing us refer only to `time_sec[1]`; you don't need to compute the other 10000 values to demonstrate the problem. Try to narrow your code down to something that declares and initializes the relevant variables, and show us something we can try ourselves. (As it is, I have no idea what the values of `time_sec`, `dt`, and `UI0_offset` are.) – Keith Thompson Dec 09 '11 at 01:00
  • I tried experimenting with -O2 optimization but it didn't improve anything. – ggkmath Dec 09 '11 at 02:22

1 Answers1

4

I'm not sure if the x64 architecture contains the same 80-bit (long double) FP registers as the x86 does, but oftentimes results like these in the x86 world arise when intermediate results (i.e. the first multiplication) remain in the 80-bit registers rather than getting flushed back to cache/RAM. Effectively part of your calculation is done at higher precision, thus the differing results.

GCC has an option (-ffloat-store if my memory serves) that will cause intermediate results to be flushed back to 64-bit precision. Try to enable that and see if you match the GDB/Matlab result.

Drew Hall
  • 28,429
  • 12
  • 61
  • 81
  • Hmm, I get compile error: gcc44: unrecognized option '-store', and four of these: cc1:error: unrecognized command line option "-ffloat". Is the order of placement important? I'm using, # gcc44 -g -ansi -ffloat -store infile.c -o out -lm – ggkmath Dec 09 '11 at 01:23
  • @ggkmath: It's one option, -ffloat-store (no space between 't' and '-'). – Drew Hall Dec 09 '11 at 01:24
  • Oh, my mistake. OK, it compiles now, but I see the exact same results (nothing changed). – ggkmath Dec 09 '11 at 01:28
  • @ggkmath: Too bad. You might try the following instead, then: `-msse2 -mfpmath=sse`. That will force all FP math to use the 64-bit SSE registers (that is, no 80-bit path). If that doesn't do it, I think my theory is busted & you need to look at other FP optimization options. – Drew Hall Dec 09 '11 at 01:49
  • Thanks Drew, I tried with those switches and still get exactly the same result (no change). I was under the impression that double-precision has 31 digits of accuracy after the decimal point, well, just because I can print them and see them. But, now I'm thinking maybe those digits after the 16th digit shouldn't be trusted as accurate. – ggkmath Dec 09 '11 at 02:47
  • @ggkmath: Sorry to send you on a wild goose chase. BTW, double precision is 15-16 decimal digits, total. Decimal digits after that are effectively random numbers. – Drew Hall Dec 09 '11 at 02:47
  • I really appreciate your input Drew. Yah, I'm coming around to believing I shouldn't pay any attention to any decimal digits after the 16th. – ggkmath Dec 09 '11 at 03:46
  • Interesting discussion here (specifically ThomasMcLeod's answer). http://stackoverflow.com/questions/4738768/printing-double-without-losing-precision – ggkmath Dec 09 '11 at 03:47