1

I have a numerical C++ model of thousands of LOC that outputs different results on different machines and I want to fully understand why that is. Those differences start out in like the 14th significant digit but then propagate to produce quite different results.

I've spent many days on this now and I finally came to this point: When I build it on my machine and copy it to machine B, I can get the exact same results on both machines if I compiled it via

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXE_LINKER_FLAGS="-static" -DCMAKE_FIND_LIBRARY_SUFFIXES=".a" ..

So that is already great. But I want to know more specifically what causes the differences. So I checked which libraries are possible culprits:

$ ldd my_software
linux-vdso.so.1 (0x00007ffdbd99c000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f08720b4000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0871d16000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0871afe000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f087170d000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0872724000)

So my idea was that I would ideally find one library that changes it and then think about what part of this library that is and if I can do something about this.

Only linking a single library statically worked for, e.g., libgcc:

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXE_LINKER_FLAGS="-static-libgcc" ..

But the results were still different. So that was not it. The problem now is, that I seem to not be able to link libc or libm statically:

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXE_LINKER_FLAGS="-static-libm" ..  # works
cd ..
cmake --build statically_linked_m/ --target my_software
...
[100%] Linking CXX executable my_software
g++: error: unrecognized command line option ‘-static-libm’; did you mean ‘-static-libgo’?

So I am wondering, why can't I do that? I mean, it apparently got linked statically when I compiled it with the -DCMAKE_EXE_LINKER_FLAGS="-static" -DCMAKE_FIND_LIBRARY_SUFFIXES=".a" flags. So why is this not working?

I hope I made my issue clear. Any pointers are highly appreciated!

konse
  • 885
  • 1
  • 10
  • 21
  • 2
    The library `libm` is not a part of the compiler, so there is no compiler option like `-static-libm`. Linking with `libm` is specified in the linker command line **explicitly**, e.g. with `-lm` parameter. So, for link with `libm` statically you need to replace that parameter with `-l`. Among other things, that implies that you have a static variant of the `libm` library. See also [that question](https://stackoverflow.com/questions/56415996/linking-error-selective-static-linking-of-libm-a-in-gcc). – Tsyvarev Feb 11 '21 at 13:00
  • There are compiler flags you can use improve floating-point conformance for GCC, see [docs](https://gcc.gnu.org/wiki/FloatingPointMath) and [this SO post](https://stackoverflow.com/questions/7295861/enabling-strict-floating-point-mode-in-gcc). Also, do you get consistent results when running on the same machine? – Mansoor Feb 11 '21 at 13:02
  • If the optimizer for different versions of a library decided to perform two floating point operations in a different order then that might be enough to cause what you are seeing. After all, there will always be errors, you just want to always have the _same_ errors. So, if you dynamically link a library because it doesn't cause you problems when you test, are you 100% sure it won't cause you problems next week on a customer machine? – Andy Newman Feb 11 '21 at 13:07
  • If your model has floating point errors that _grow_ then you might want to reconsider your model a bit ... how do you know which version of the library that is changing the result is giving the _right_ result? Or is accuracy unimportant to you here so long as you have consistency? – Andy Newman Feb 11 '21 at 13:08
  • Hi @M.A I tried this --ffloat-store, but the problem remains. I will try the other things. – konse Feb 11 '21 at 13:10

0 Answers0