3

In finding a bug which made everything turn into NaNs when running the optimized version of my code (compiling in g++ 4.8.2 and 4.9.3), I identified that the problem was the -Ofast option, specifically, the -ffinite-math-only flag it includes.

One part of the code involves reading floats from a FILE* using fscanf, and then replacing all NaNs with a numeric value. As could be expected, however, -ffinite-math-only kicks in, and removes these checks, thus leaving the NaNs.

In trying to solve this problem, I stumbled uppon this, which suggested adding -fno-finite-math-only as a method attribute to disable the optimization on the specific method. The following illustrates the problem and the attempted fix (which doesn't actually fix it):

#include <cstdio>
#include <cmath>

__attribute__((optimize("-fno-finite-math-only"))) 
void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}

int main(void){
    const size_t cnt = 10;
    float val[cnt];
    for(int i = 0; i < cnt; i++) scanf("%f", val + i);
    replaceNaN(val, cnt, -1.0f);
    for(int i = 0; i < cnt; i++) printf("%f ", val[i]);
    return 0;
}

The code does not act as desired if compiled/run using echo 1 2 3 4 5 6 7 8 nan 10 | (g++ -ffinite-math-only test.cpp -o test && ./test), specifically, it outputs a nan (which should have been replaced by a -1.0f) -- it behaves fine if the -ffinite-math-only flag is ommited. Shouldn't this work? Am I missing something with the syntax for attributes in gcc, or is this one of the afforementioned "there being some trouble with some version of GCC related to this" (from the linked SO question)

A few solutions I'm aware of, but would rather something a bit cleaner/more portable:

  • Compile the code with -fno-finite-math-only (my interrim solution): I suspect that this optimization may be rather useful in my context in the remainder of the program;
  • Manually look for the string "nan" in the input stream, and then replace the value there (the input reader is in an unrelated part of the library, yielding poor design to include this test there).
  • Assume a specific floating point architecture and make my own isNaN: I may do this, but it's a bit hackish and non-portable.
  • Prefilter the data using a separately compiled program without the -ffinite-math-only flag, and then feed that into the main program: The added complexity of maintaining two binaries and getting them to talk to each other just isn't worth it.

Edit: As put in the accepted answer, It would seem this is a compiler "bug" in older versions of g++, such as 4.82 and 4.9.3, that is fixed in newer versions, such as 5.1 and 6.1.1.

If for some reason updating the compiler is not a reasonably easy option (e.g.: no root access), or adding this attribute to a single function still doesn't completely solve the NaN check problem, an alternate solution, if you can be certain that the code will always run in an IEEE754 floating point environment, is to manually check the bits of the float for a NaN signature.

The accepted answer suggests doing this using a bit field, however, the order in which the compiler places the elements in a bit field is non-standard, and in fact, changes between the older and newer versions of g++, even refusing to adhere to the desired positioning in older versions (4.8.2 and 4.9.3, always placing the mantissa first), regardless of the order in which they appear in the code.

A solution using bit manipulation, however, is guaranteed to work on all IEEE754 compliant compilers. Below is my such implementation, which I ultimately used to solve my problem. It checks for IEEE754 compliance, and I've extended it to allow for doubles, as well as other more routine floating point bit manipulations.

#include <limits> // IEEE754 compliance test
#include <type_traits> // enable_if

template<
    typename T, 
    typename = typename std::enable_if<std::is_floating_point<T>::value>::type,
    typename = typename std::enable_if<std::numeric_limits<T>::is_iec559>::type,
    typename u_t = typename std::conditional<std::is_same<T, float>::value, uint32_t, uint64_t>::type
>
struct IEEE754 {

    enum class WIDTH : size_t {
        SIGN = 1, 
        EXPONENT = std::is_same<T, float>::value ? 8 : 11,
        MANTISSA = std::is_same<T, float>::value ? 23 : 52
    };
    enum class MASK : u_t {
        SIGN = (u_t)1 << (sizeof(u_t) * 8 - 1),
        EXPONENT = ((~(u_t)0) << (size_t)WIDTH::MANTISSA) ^ (u_t)MASK::SIGN,
        MANTISSA = (~(u_t)0) >> ((size_t)WIDTH::SIGN + (size_t)WIDTH::EXPONENT)
    };
    union {
        T f;
        u_t u;
    };

    IEEE754(T f) : f(f) {}

    inline u_t sign() const { return u & (u_t)MASK::SIGN >> ((size_t)WIDTH::EXPONENT + (size_t)WIDTH::MANTISSA); }
    inline u_t exponent() const { return u & (u_t)MASK::EXPONENT >> (size_t)WIDTH::MANTISSA; }
    inline u_t mantissa() const { return u & (u_t)MASK::MANTISSA; }

    inline bool isNan() const {
        return (mantissa() != 0) && ((u & ((u_t)MASK::EXPONENT)) == (u_t)MASK::EXPONENT);
    }
};
template<typename T>
inline IEEE754<T> toIEEE754(T val) { return IEEE754<T>(val); }

And the replaceNaN function now becomes:

void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) 
        if (toIEEE754(arr[i]).isNan()) arr[i] = newValue;
}

An inspection of the assembly of these functions reveals that, as expected, all masks become compile-time constants, leading to the following (seemingly) efficient code:

# In loop of replaceNaN
movl    (%rcx), %eax       # eax = arr[i] 
testl   $8388607, %eax     # Check if mantissa is empty
je  .L3                    # If it is, it's not a nan (it's inf), continue loop
andl    $2139095040, %eax  # Mask leaves only exponent
cmpl    $2139095040, %eax  # Test if exponent is all 1s
jne .L3                    # If it isn't, it's not a nan, so continue loop

This is one instruction less than with a working bit field solution (no shift), and the same number of registers are used (although it's tempting to say this alone makes it more efficient, there are other concerns such as pipelining which may make one solution more or less efficient than the other one).

Community
  • 1
  • 1
André Harder
  • 354
  • 2
  • 11
  • 2
    https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007boptimize_007d-function-attribute-3278 "This attribute should be used for debugging purposes only. It is not suitable in production code." – Marc Glisse Jul 11 '16 at 04:22

1 Answers1

5

Looks like a compiler bug to me. Up through GCC 4.9.2, the attribute is completely ignored. GCC 5.1 and later pay attention to it. Perhaps it's time to upgrade your compiler?

__attribute__((optimize("-fno-finite-math-only"))) 
void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}

Compiled with -ffinite-math-only on GCC 4.9.2:

replaceNaN(float*, int, float):
        rep ret

But with the exact same settings on GCC 5.1:

replaceNaN(float*, int, float):
        test    esi, esi
        jle     .L26
        sub     rsp, 8
        call    std::isnan(float) [clone .isra.0]
        test    al, al
        je      .L2
        mov     rax, rdi
        and     eax, 15
        shr     rax, 2
        neg     rax
        and     eax, 3
        cmp     eax, esi
        cmova   eax, esi
        cmp     esi, 6
        jg      .L28
        mov     eax, esi
.L5:
        cmp     eax, 1
        movss   DWORD PTR [rdi], xmm0
        je      .L16
        cmp     eax, 2
        movss   DWORD PTR [rdi+4], xmm0
        je      .L17
        cmp     eax, 3
        movss   DWORD PTR [rdi+8], xmm0
        je      .L18
        cmp     eax, 4
        movss   DWORD PTR [rdi+12], xmm0
        je      .L19
        cmp     eax, 5
        movss   DWORD PTR [rdi+16], xmm0
        je      .L20
        movss   DWORD PTR [rdi+20], xmm0
        mov     edx, 6
.L7:
        cmp     esi, eax
        je      .L2
.L6:
        mov     r9d, esi
        lea     r8d, [rsi-1]
        mov     r11d, eax
        sub     r9d, eax
        lea     ecx, [r9-4]
        sub     r8d, eax
        shr     ecx, 2
        add     ecx, 1
        cmp     r8d, 2
        lea     r10d, [0+rcx*4]
        jbe     .L9
        movaps  xmm1, xmm0
        lea     r8, [rdi+r11*4]
        xor     eax, eax
        shufps  xmm1, xmm1, 0
.L11:
        add     eax, 1
        add     r8, 16
        movaps  XMMWORD PTR [r8-16], xmm1
        cmp     ecx, eax
        ja      .L11
        add     edx, r10d
        cmp     r9d, r10d
        je      .L2
.L9:
        movsx   rax, edx
        movss   DWORD PTR [rdi+rax*4], xmm0
        lea     eax, [rdx+1]
        cmp     eax, esi
        jge     .L2
        add     edx, 2
        cdqe
        cmp     esi, edx
        movss   DWORD PTR [rdi+rax*4], xmm0
        jle     .L2
        movsx   rdx, edx
        movss   DWORD PTR [rdi+rdx*4], xmm0
.L2:
        add     rsp, 8
.L26:
        rep ret
.L28:
        test    eax, eax
        jne     .L5
        xor     edx, edx
        jmp     .L6
.L20:
        mov     edx, 5
        jmp     .L7
.L19:
        mov     edx, 4
        jmp     .L7
.L18:
        mov     edx, 3
        jmp     .L7
.L17:
        mov     edx, 2
        jmp     .L7
.L16:
        mov     edx, 1
        jmp     .L7

The output is similar, although not quite identical, on GCC 6.1.

Replacing the attribute with

#pragma GCC push_options
#pragma GCC optimize ("-fno-finite-math-only")
void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}
#pragma GCC pop_options

makes absolutely no difference, so it is not simply a matter of the attribute being ignored. These older versions of the compiler clearly do not support controlling the floating-point optimization behavior at function-level granularity.

Note, however, that the generated code on GCC 5.1 and later is still significantly worse than compiling the function without without the -ffinite-math-only switch:

replaceNaN(float*, int, float):
        test    esi, esi
        jle     .L1
        lea     eax, [rsi-1]
        lea     rax, [rdi+4+rax*4]
.L5:
        movss   xmm1, DWORD PTR [rdi]
        ucomiss xmm1, xmm1
        jnp     .L6
        movss   DWORD PTR [rdi], xmm0
.L6:
        add     rdi, 4
        cmp     rdi, rax
        jne     .L5
        rep ret
.L1:
        rep ret

I have no idea why there is such a discrepancy. Something is badly throwing the compiler off its game; this is even worse code than you get with optimizations completely disabled. If I had to guess, I'd speculate it was the implementation of std::isnan. If this replaceNaN method is not speed-critical, then it probably doesn't matter. If you need to repeatedly parse values from a file, you might prefer to have a reasonably efficient implementation.

Personally, I would write my own non-portable implementation of std::isnan. The IEEE 754 formats are all quite well-documented, and assuming you thoroughly test and comment the code, I can't see the harm in this, unless you absolutely need the code to be portable to all different architectures. It will drive purists up the wall, but so should using non-standard options like -ffinite-math-only. For a single-precision float, something like:

bool my_isnan(float value)
{
  union IEEE754_Single
  {
    float f;
    struct
    {
    #if BIG_ENDIAN
        uint32_t sign     : 1;
        uint32_t exponent : 8;
        uint32_t mantissa : 23;
    #else
        uint32_t mantissa : 23;
        uint32_t exponent : 8;
        uint32_t sign     : 1;
    #endif
    } bits;
  } u = { value };

  // In the IEEE 754 representation, a float is NaN when
  // the mantissa is non-zero, and the exponent is all ones
  // (2^8 - 1 == 255).
  return (u.bits.mantissa != 0) && (u.bits.exponent == 255);
}

Now, no need for annotations, just use my_isnan instead of std::isnan. The produces the following object code when compiled with -ffinite-math-only enabled:

replaceNaN(float*, int, float):
        test    esi, esi
        jle     .L6
        lea     eax, [rsi-1]
        lea     rdx, [rdi+4+rax*4]
.L13:
        mov     eax, DWORD PTR [rdi]     ; get original floating-point value
        test    eax, 8388607             ; test if mantissa != 0
        je      .L9
        shr     eax, 16                  ; test if exponent has all bits set
        and     ax, 32640
        cmp     ax, 32640
        jne     .L9
        movss   DWORD PTR [rdi], xmm0    ; set newValue if original was NaN
.L9:
        add     rdi, 4
        cmp     rdx, rdi
        jne     .L13
        rep ret
.L6:
        rep ret

The NaN check is slightly more complicated than a simple ucomiss followed by a test of the parity flag, but is guaranteed to be correct as long as your compiler adheres to the IEEE 754 standard. This works on all versions of GCC, and any other compiler.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
  • Very thorough! I should note, however, that the bit field approach is not guaranteed to work on all IEEE754 compliant compilers, as the standard defines no particular order for the members of a union, and indeed it doesn't, such as on both my `g++ 4.8.2` and `4.9.3` installs, where it decides to put the `bits.mantissa` at the start of the field (regardless of their order in the code). The bit field approach does however work fine in MSVC and `g++ 6.1.1`. Doing the same with bit masks, however, should work fine in all IEEE754 compliant compilers. Fixing this, this would be the correct answer! – André Harder Jul 09 '16 at 19:50