I have a simple c++ program (foo.cxx):
#include <stdio.h>
#include <math.h>
int main()
{
long int *p;
double ang2 = -0.23202523431296057;
p = (long int*)&ang2;
printf("The bits of ang2 are %lx\n", *p);
double sin_ang2 = sin(ang2);
printf("sin_ang2 is %0.17f\n", sin_ang2);
p = (long int*)&sin_ang2;
printf("The bits of sin_ang2 are %lx\n", *p);
}
I have two different machines with different hardware, both at Ubuntu 20.04 and both with gcc at 9.3.0. On these two machines, I compile the above code with this command:
g++ -ffloat-store foo.cxx
On machine 1, the result of running the above program is:
The bits of ang2 are bfcdb300bc9c468a
sin_ang2 is -0.22994895724656178
The bits of sin_ang2 are bfcd6ef7a98fc7ce
On machine 2, the result of running the above program is:
The bits of ang2 are bfcdb300bc9c468a
sin_ang2 is -0.22994895724656181
The bits of sin_ang2 are bfcd6ef7a98fc7cf
Notice the slight difference in the results of calling sin() on these two machines. My question is whether or not this should be expected. I realize there are many nuances with floating point arithmetic that can lead to imprecise results, but is this an example of one? My understanding is that the -ffloat-store option to gcc could have helped deliver consistent results across machines, though it didn't seem to help here:
-ffloat-store
Do not store floating-point variables in registers, and inhibit other options that might change whether a floating-point value is taken from a register or memory. This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.
The hardware for machine one (lscpu) is:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 36 bits physical, 48 bits virtual
CPU(s): 4
...
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Model name: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
...
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr
r pge mca cmov pat pse36 clflush dts acpi mmx f
xsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
constant_tsc arch_perfmon pebs bts rep_good nop
l xtopology nonstop_tsc cpuid aperfmperf pni pc
lmulqdq dtes64 monitor ds_cpl vmx smx est tm2 s
sse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic p
opcnt tsc_deadline_timer aes xsave avx f16c rdr
and lahf_lm cpuid_fault epb pti ssbd ibrs ibpb
stibp tpr_shadow vnmi flexpriority ept vpid fsg
sbase smep erms xsaveopt dtherm ida arat pln pt
s md_clear flush_l1d
And the hardware for machine 2 is:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 16
...
Vendor ID: GenuineIntel
CPU family: 6
Model: 158
Model name: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
...
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr
r pge mca cmov pat pse36 clflush dts acpi mmx f
xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd
tscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperf
mperf pni pclmulqdq dtes64 monitor ds_cpl vmx s
mx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid s
se4_1 sse4_2 x2apic movbe popcnt tsc_deadline_t
imer aes xsave avx f16c rdrand lahf_lm abm 3dno
wprefetch cpuid_fault epb invpcid_single ssbd i
brs ibpb stibp ibrs_enhanced tpr_shadow vnmi fl
expriority ept vpid ept_ad fsgsbase tsc_adjust
bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx
smap clflushopt intel_pt xsaveopt xsavec xgetb
v1 xsaves dtherm ida arat pln pts hwp hwp_notif
y hwp_act_window hwp_epp md_clear flush_l1d arc
h_capabilities
Any suggestions on ways to get consistent results across these two machines?