0

I am aware that floating-point numbers are tricky. But today I encountered a case that I cannot explain (and cannot reproduce using a standalone C++ code).

The code within a large project looks like this:

int i = 12;

// here goes several function calls passing the value i around, 
// but using different types (due to unfortunate legacy code)
... 

float f = *((float*)(&i)); // f=1.681558e-44

if (f == 0) {
    do something;
} else {
    do something else;
}

This piece of code causes a random behavior. Using gdb, it's identified that the random behavior is due to the comparison f == 0 which gives random results, i.e., sometimes true, sometimes false. The bug in the code was that, before using f, it should check whether or not the 4-bytes should be interpreted as integer (using other aux information). The fix is to first cast it back to integer, and then compare the integer with 0. Then problem solved.

Also in case a floating number comparison is needed (in such case, the floating number is not casted from integer as shown above), I also changed the comparison to abs(f) < std::numeric_limits<float>::epsilon(), to be on the safer side.

After that, I also wanted to reproduce it using a simple test program, but it seems I cannot reproduce it. (The compiler used for the project is different from what I am using for compiling the test program though). The following is the test program:

#include <stdio.h>

int main(void){
    int i = 12;
    float f = *(float*)(&i);

    for (int i = 0; i < 5; ++i) {
        printf("f=%e %s\n", f, (f == 0)? "=0": "!=0");
    }
    return 0;
}

I am wondering, what could be the reason for the random behavior of the comparison with zero?

phuclv
  • 37,963
  • 15
  • 156
  • 475
bruin
  • 979
  • 1
  • 10
  • 30
  • it is C++ not C. `abs(f) < std::numeric_limits::epsilon()` is for sure not C – 0___________ Jun 15 '21 at 08:36
  • Does this answer your question? [Comparing floating point number to zero](https://stackoverflow.com/questions/19837576/comparing-floating-point-number-to-zero) – Louis Go Jun 15 '21 at 08:38
  • [Related topic](https://isocpp.org/wiki/faq/newbie#floating-point-arith). Don't compare a float number to zero. – Louis Go Jun 15 '21 at 08:40
  • @Louis My question is more about the reasons for the random behavior, which I did not find from those posts yet. – bruin Jun 15 '21 at 08:42
  • 5
    This hides the compiler warning about the type of pointer with a cast, but it still violates the strict aliasing rule: dereferencing a pointer that aliases an object that is not of a compatible type is undefined behavior. – Weather Vane Jun 15 '21 at 08:42
  • @Weather Thanks for the info. I will do a search on that topic. – bruin Jun 15 '21 at 08:44
  • Integer and float are saved very differently internally, meaning that performing that cast is undefined behaviour from the get go. Furthermore, when dealing with floating points, '==' is just about never a 'safe' operation, due to the way float is stored. It's about never exactly '0.1', but '0.10000001'. – Refugnic Eternium Jun 15 '21 at 08:45
  • Inverting the task (and using a union which is still UB) I find that the `float` value `FLT_MIN` (the smallest positive value that can be represented) converts to `0x00800000`. So what would happen the other way round with `0x0000000C` is unclear to me, but clearly it isn't zero, and compilers may optimize this UB in different ways. – Weather Vane Jun 15 '21 at 08:50
  • @RefugnicEternium I omitted some details here. To me, the cast is safe as long as the 4-bytes of the integer is used as integer. It's converted to float type just for passing around, and it will be used as integer in the end. The bug in the project was that it does not check whether or not the 4-bytes should be interpreted as integer, but just used as float (thus the behaivor is “undefined“ since the content in the 4-bytes could be NaN, infinity, etc). – bruin Jun 15 '21 at 08:53
  • 2
    @bruin no, it is worse than that. the strict aliasing rule allows a function like `void foo(int * a, float * b)` to assume that changes to `*a` don't change `*b`. It allows the compiler to assume that `*(float*)(&i);` only appears in unreachable code, and eliminate whole swathes of your program. – Caleth Jun 15 '21 at 09:01
  • 1
    You are asking us to speculate about Undefined Behaviour. This is a pointless task. By definition Undefined Behaviour is not subject to logical analysis. Please post a [mcve] that reproduces the problem and does not contain UB. – Richard Critten Jun 15 '21 at 09:02
  • Why do you want to `reinterpret_cast` a integer to float numer? Or that part is fixed and you can't modify? – Louis Go Jun 15 '21 at 09:03
  • @Louis Yes, it's a rather large project, and the interfaces involved are beyond my scope :( – bruin Jun 15 '21 at 09:08
  • @RichardCritten I understand your point. Strictly follow what you said is sometimes a deadlock: if one is aware some behavior is UB, then he/she will not have to ask the question in the 1st place. For such questions, if someone points out that certain behavior is UB, then to me, the question and answer both make sense. no? – bruin Jun 15 '21 at 09:15
  • 1
    UB of type punning aside, you have a de-normal `float`. Behaviour of such numbers may depend on compiler flags, rounding mode and who knows what else. – n. m. could be an AI Jun 15 '21 at 09:28
  • @Caleth Let me explain the current code: firstly `int i = 12; float f = *((float*)&i);` then `f` is passed by *value* to some other functions, and finally reach the place where it's content is accessed. Do you see UB in accessing it as `int i2=*((int*)(&f))`? Thanks! – bruin Jun 15 '21 at 09:36
  • 1
    @bruin the expression `*((float*)&i)` all on it's own is undefined behaviour, you've dereferenced a `float *` that doesn't point to a float object. You are only allowed to reinterpret cast a pointer-to-A to a pointer-to-B if you never dereference the pointer-to-B, or `B` is one of `char`, `unsigned char`, `signed char` or `std::byte` – Caleth Jun 15 '21 at 09:42
  • @Caleth I see. Thanks. Given that the interface cannot be changed, is it ok to change the cast to `memcpy()`? e.g., in the first place, do `memcpy(&f, &i, 4)`, and in the last place, do `memcpy(&i2, &f, 4)`. Thanks again. – bruin Jun 15 '21 at 09:52
  • 3
    @bruin `std::memcpy` in this case will give well defined behaviour. _"...When it is needed to interpret the bytes of an object as a value of a different type, std::memcpy or std::bit_cast (since C++20)can be used:..."_ https://en.cppreference.com/w/cpp/language/reinterpret_cast Note that the c-stype casts in this case are equivalent to `reinterpret_cast`. – Richard Critten Jun 15 '21 at 10:01
  • @bruin Do a search for "subnormal float" as the value you have in the float is one of these - there are all sorts of edge cases concerning these values if the computer thinks it is loading a float value. – Richard Critten Jun 15 '21 at 10:10

1 Answers1

1

Barring the undefined behavior which can be easily be fixed, you're seeing the effect of denormal numbers. They're extremely slow (see Why does changing 0.1f to 0 slow down performance by 10x?) so in modern FPUs there are usually denormals-are-zero (DAZ) and flush-to-zero (FTZ) flags to control the denormal behavior. When DAZ is set the denormals will compare equal to zero which is what you observed

Currently you'll need platform-specific code to disable it. Here's how it's done in x86:

#include <string.h>
#include <stdio.h>
#include <pmmintrin.h>

int main(void){
    int i = 12;
    float f;
    memcpy(&f, &i, sizeof i);

    _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);
    _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
    printf("%e %s 0\n", f, (f == 0) ? "=": "!=");

    _MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_OFF);
    _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_OFF);
    printf("%e %s 0\n", f, (f == 0) ? "=": "!=");

    return 0;
}

Output:

0.000000e+00 = 0
1.681558e-44 != 0

Demo on Godbolt

See also:

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • Thank you very much. Now I can pin down the exact line of UB code and reproduce the bug. That line of code is `*dst = *(float*)(&tmp)` i.e., de-referencing a float pointer which is actually pointing to an integer (as @Caleth pointed out). If I change these UB code to memcpy(), problem disappeared. Thanks again! – bruin Jun 15 '21 at 10:31
  • Re “in x86”: x86 is a nickname for an architecture, not a platform. Code may differ between macOS, Linux, etc., and may differ between versions of them and between the developer tools used. – Eric Postpischil Jun 15 '21 at 16:36