Portable Mathematical C Code Behaving Differently on Linux64 and Windows32

Question

I've written a code that perform mathematical calculations using IIR filters and histograms. The code is supposed to be 100% portable, is reentrant, makes no system call (but memset), has double precision const and initialized variables, and only includes math.h, string.h, stdint.h and stddef.h. It's been made to run in an embedded processor.

When compiling and running the code using Windows MinGW GCC or Borland C++ it will pass all the unit tests. That doesn't happen on the Linux64 GCC platform. Further investigation has shown that the algorithm becomes lightly unstable and the output value doesn't converge to a stable result, instead, the result goes slowly to infinite, so one of the tests will fail.

Most of the code is using double floating-point precision, it also uses single precision in some variables.

I need help on how to approach this problem, I must guarantee the fully portability of code, and I don't know where to look. Also, the code is big enough to be posted here, so, if you can point directions I'll follow.

This is the compile line of the module on Linux:

gcc -O3 -g3 -Wall -c -fmessage-length=0 -MMD -MP -MF"src/flickerModule.d" -MT"src/flickerModule.d" -o "src/flickerModule.o" "../src/flickerModule.c"

This is the compile line of the module on Windows:

gcc -O0 -g3 -Wall -c -fmessage-length=0 -o src\flickerModule.o ..\src\flickerModule.c

Here I show all the dependencies of the code:

fanl@fanl-STI:~/WorkFelipe/Codigos/re7k_eclipsewkspace/flicker_unittest/src$ grep -H -n -E 'mem|sqrt|floor' flickerModule.c
flickerModule.c:76: memset(filt, 0x00, sizeof(struct s_filter));
flickerModule.c:81: memset(filt, 0x00, sizeof(struct s_filter));
flickerModule.c:89: memset(&histo->buf, 0x00, sizeof(uint32_t) * NUM_CLASSES);
flickerModule.c:162:    return (floorf(x + 0.5));
flickerModule.c:189:            memset(&histn, 0x00, sizeof(float) * NUM_CLASSES);
flickerModule.c:262:        hp->Urms_meio_ciclo = sqrt(hp->Acc_Urms_meio_ciclo / NUM_AMOSTRAS_MEIO_CICLO_60Hz);
flickerModule.c:370:        return (sqrtf(saida));
flickerModule.c:393:        p->GanhoNormalizaEntradaFlickerMeter = 1.0 / (p->halfPeriod.Urms_meio_ciclo * sqrt(2));
flickerModule.c:419:            dbg->Prms = sqrt(dbg->Acc_Prms / (NUM_AMOSTRAS_1MIN / FATOR_DOWN1));

Can you post the specific portions of the code that cause the problems? — Kninnug, Jul 04 '13 at 19:24
Do you, by any chance, link DirectX to your code under Windows? — liori, Jul 04 '13 at 19:27
One suggestion, try it with all optimizations disabled (`-O0`) and see if you still see the same behavior. — cobbal, Jul 04 '13 at 19:30
0) do you compile with a C++ compiler ? 1) are all automatic variables initialised correctly ? 2) does gcc compile cleanly with maximal warnings enabled (-Wall -pedantic, IIRC) ? — wildplasser, Jul 04 '13 at 19:31
@Kninnug I couldn't isolate properly the fault yet. The code runs in loop for a great number of points what makes it tough to walkthrough. Soon I'll go deeper in debugging. — Felipe Lavratti, Jul 04 '13 at 19:31
@cobbal Same behavior for Windows and Linux on all optimization levels, Windows behaves right, Linux not. — Felipe Lavratti, Jul 04 '13 at 19:33
@wildplasser 0) I use GCC compiler, linked against G++ compiled code (The unit tester) 1) Yes, but I'll triple check on this, again. 2) No warnings at all! — Felipe Lavratti, Jul 04 '13 at 19:36
`makes no system call (but memset)` Huh? Does not the program do any I/O at all? How would you know it even works? (BTW: memset is not a system call) do check sizeofs, though . (you won't get warnings for wrong 3rd arguments for memset() / memcpy(), and this _could_ be caused by `sizeof (void*) != sizeof(int)` — wildplasser, Jul 04 '13 at 19:36
Your edit shows you're using `-O0` on Windows but `-O3` on Linux. Try it with optimizations off on both platforms. — Kninnug, Jul 04 '13 at 19:42
@Kninnug True, but I've tried all optimization corners and the behavior persist. — Felipe Lavratti, Jul 04 '13 at 19:45
Given the symptoms, this is most certainly a (lack of) initialisation bug. And looking at the casus it is related to the 32/64bits port. Without source code nor crystall ball, my best shot is an error in line#42. Please grep for one line with both memset and sizeof on it, and add that code fragment to your question. — wildplasser, Jul 04 '13 at 19:53
@wildplasser The module entrance function receives structs where the state, output data and debug data is kept, this memory is managed by the caller. — Felipe Lavratti, Jul 04 '13 at 20:23
Well, maybe the caller is wrong then. (the unit tests failed as well, didn't they ?) BTW: I am feeling a strong urge to clos this question; without code this is getting nowhere (with code it would probably be "too localised" ...) ... And the good new is: "too localised" does not exist enymore! — wildplasser, Jul 04 '13 at 20:31
@wildplasser Off-topic, questions about code you have written need an SSCCE — David Heffernan, Jul 04 '13 at 20:35
@wildplasser Lol. The caller is not wrong, it is just the simulator and the misbehavior has been verified to be created from the inside of the calculation module. I agree we are getting nowhere. Also, I could post the code, but it has been badly written by thirds, it is not legible, I doubt someone would really read or understand its mathematical bla bla bla. — Felipe Lavratti, Jul 04 '13 at 20:37
@fanl In that case, the root cause is probably unmaintainable code. However, if you can't fix that, I suggest parallel debug - run the working and non-working version under debuggers, or with output statements added. Begin by looking about half way through the computation - are they in the same state? If yes, look three quarters of the way through. If no, look one quarter of the way through. Continue binary search until you see where they go different. You will then either know what went wrong, or be able to write an answerable question. — Patricia Shanahan, Jul 04 '13 at 20:42
@PatriciaShanaham I must agree with you. I've started with parallel debugging but gave-up, since the module performs calculations over a curve of 92000 samples. I'll work on the parallel debug. Topic is closing. Thank you all for your time! — Felipe Lavratti, Jul 04 '13 at 20:45
possible duplicate of [Is there any way to make sure the floating point arithmetic result the same in both linux and windows](http://stackoverflow.com/questions/16395615/is-there-any-way-to-make-sure-the-floating-point-arithmetic-result-the-same-in-b) — Pascal Cuoq, Jul 04 '13 at 22:10
@PascalCuoq: You are right, the solution provided on your link solved the problem! Thank you very much. — Felipe Lavratti, Jul 05 '13 at 16:22

score 3 · Accepted Answer · answered Jul 04 '13 at 20:51

Your linux build is using -O3 (lots of optimisations) and your Windows build is using -O0 (no optimisation). Optimising floating point code is really hard and there is often a trade off between accuracy and speed. Try using -O0 on the linux build. Here's an article about some of the issues regarding optimising floating point code in the VS compiler, and the gcc compiler will face similar issues.

On the IA32, the FPU works at 80bits precision internally, regardless of float or double, and precision is lost when data is written to RAM as the size if reduced from 80bits to 32/64 bits. You can also modify the precision of the transcendental functions, but this is a programmer option and is usually set to highest precision.

Some optimisers will use the SSE registers for manipulating floats and doubles and these work at 32 and 64 bit precision internally. The advantage being that the optimiser could vectorise the code and perform operations in parallel, there are several functions available in the SSE that aren't in the FPU and the programming model of the SSE (CPU style registers) is easier to work with than the FPU (stack based registers). The downside is that the results can be slightly different to those you get from using the FPU due to the reduced precision.

So try compiling both versions using the debug build options and see if there's any difference there.

If you want them to work exactly the same, you may need to consider hand crafting the floating point code to eliminate any peculiarities of the compiler.

“On the IA32, the FPU works at 80bits precision internally” Only if you do not tell the compiler not to use SSE2 instructions, for the case it is executed on a 10-year-old processor. — Pascal Cuoq, Jul 04 '13 at 22:11
@PascalCuoq: I did mention that SSE registers are used by some compilers. Those 10 year old processors are still used a lot. As you say though, you can stop the compiler from using the SSE registers. — Skizz, Jul 05 '13 at 07:54
Adding -msse2 -mfpmath=sse flags to both compilers made Windows and Linux behaves identically. Now I just have to check how the target platform handles double precision. — Felipe Lavratti, Jul 05 '13 at 16:34

score 1 · Answer 2 · answered Jul 04 '13 at 19:31

1

Have you tried running static analysis, using tools such as LDRA/Lint/etc.? Tools like these can often pinpoint areas of concern also does your code compile with maximum warnings turned on without giving any warnings. The compiler will often highlight potential problems as well. Personally I find it good practice in gcc to use -Wall -Werror.

answered Jul 04 '13 at 19:31

Steve Barnes

27,618
6
63
73

No error or warning using cppcheck, I'll try Lint. – Felipe Lavratti Jul 04 '13 at 19:50
Lint is clean, It output some global exposed only. – Felipe Lavratti Jul 04 '13 at 20:37

wildplasser · Answer 3 · 2013-07-04T21:18:09.243

(NoTE: this is not an answer, but I need the formatting). You could try to replace all the sizeof (type) lines by the corresponding sizeof expr versions. This is not a panaccea, but more a stylistic hint, leading to more robust code. For a start, you could change:

memset(filt, 0x00, sizeof(struct s_filter));

memset(&histo->buf, 0x00, sizeof(uint32_t) * NUM_CLASSES);

memset(&histn, 0x00, sizeof(float) * NUM_CLASSES);

Into:

memset(filt,0, sizeof *filt);

memset(&histo->buf, 0, sizeof histo->buf * NUM_CLASSES);
return (floorf(x + 0.5));
memset(&histn, 0, sizeof histn * NUM_CLASSES);

This could be wrong, just assuming, since I don't know the actual sizes, since you didn't show the struct definitions. BTW: you do not need the hex constant 0x00; it is just zero anyway.

UPDATE: Just to rule out things (arrest the ususal suspects :: differential diagnosis) : try a test run of the code in a single thread. Threading mechanisms may differ between platforms.

Portable Mathematical C Code Behaving Differently on Linux64 and Windows32

3 Answers3