In finding a bug which made everything turn into NaN
s when running the optimized version of my code (compiling in g++ 4.8.2
and 4.9.3
), I identified that the problem was the -Ofast
option, specifically, the -ffinite-math-only
flag it includes.
One part of the code involves reading floats from a FILE*
using fscanf
, and then replacing all NaN
s with a numeric value. As could be expected, however, -ffinite-math-only
kicks in, and removes these checks, thus leaving the NaN
s.
In trying to solve this problem, I stumbled uppon this, which suggested adding -fno-finite-math-only
as a method attribute to disable the optimization on the specific method. The following illustrates the problem and the attempted fix (which doesn't actually fix it):
#include <cstdio>
#include <cmath>
__attribute__((optimize("-fno-finite-math-only")))
void replaceNaN(float * arr, int size, float newValue){
for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}
int main(void){
const size_t cnt = 10;
float val[cnt];
for(int i = 0; i < cnt; i++) scanf("%f", val + i);
replaceNaN(val, cnt, -1.0f);
for(int i = 0; i < cnt; i++) printf("%f ", val[i]);
return 0;
}
The code does not act as desired if compiled/run using echo 1 2 3 4 5 6 7 8 nan 10 | (g++ -ffinite-math-only test.cpp -o test && ./test)
, specifically, it outputs a nan
(which should have been replaced by a -1.0f
) -- it behaves fine if the -ffinite-math-only
flag is ommited. Shouldn't this work? Am I missing something with the syntax for attributes in gcc, or is this one of the afforementioned "there being some trouble with some version of GCC related to this" (from the linked SO question)
A few solutions I'm aware of, but would rather something a bit cleaner/more portable:
- Compile the code with
-fno-finite-math-only
(my interrim solution): I suspect that this optimization may be rather useful in my context in the remainder of the program; - Manually look for the string
"nan"
in the input stream, and then replace the value there (the input reader is in an unrelated part of the library, yielding poor design to include this test there). - Assume a specific floating point architecture and make my own
isNaN
: I may do this, but it's a bit hackish and non-portable. - Prefilter the data using a separately compiled program without the
-ffinite-math-only
flag, and then feed that into the main program: The added complexity of maintaining two binaries and getting them to talk to each other just isn't worth it.
Edit: As put in the accepted answer, It would seem this is a compiler "bug" in older versions of g++
, such as 4.82
and 4.9.3
, that is fixed in newer versions, such as 5.1
and 6.1.1
.
If for some reason updating the compiler is not a reasonably easy option (e.g.: no root access), or adding this attribute to a single function still doesn't completely solve the NaN
check problem, an alternate solution, if you can be certain that the code will always run in an IEEE754
floating point environment, is to manually check the bits of the float for a NaN
signature.
The accepted answer suggests doing this using a bit field, however, the order in which the compiler places the elements in a bit field is non-standard, and in fact, changes between the older and newer versions of g++
, even refusing to adhere to the desired positioning in older versions (4.8.2
and 4.9.3
, always placing the mantissa first), regardless of the order in which they appear in the code.
A solution using bit manipulation, however, is guaranteed to work on all IEEE754
compliant compilers. Below is my such implementation, which I ultimately used to solve my problem. It checks for IEEE754
compliance, and I've extended it to allow for doubles, as well as other more routine floating point bit manipulations.
#include <limits> // IEEE754 compliance test
#include <type_traits> // enable_if
template<
typename T,
typename = typename std::enable_if<std::is_floating_point<T>::value>::type,
typename = typename std::enable_if<std::numeric_limits<T>::is_iec559>::type,
typename u_t = typename std::conditional<std::is_same<T, float>::value, uint32_t, uint64_t>::type
>
struct IEEE754 {
enum class WIDTH : size_t {
SIGN = 1,
EXPONENT = std::is_same<T, float>::value ? 8 : 11,
MANTISSA = std::is_same<T, float>::value ? 23 : 52
};
enum class MASK : u_t {
SIGN = (u_t)1 << (sizeof(u_t) * 8 - 1),
EXPONENT = ((~(u_t)0) << (size_t)WIDTH::MANTISSA) ^ (u_t)MASK::SIGN,
MANTISSA = (~(u_t)0) >> ((size_t)WIDTH::SIGN + (size_t)WIDTH::EXPONENT)
};
union {
T f;
u_t u;
};
IEEE754(T f) : f(f) {}
inline u_t sign() const { return u & (u_t)MASK::SIGN >> ((size_t)WIDTH::EXPONENT + (size_t)WIDTH::MANTISSA); }
inline u_t exponent() const { return u & (u_t)MASK::EXPONENT >> (size_t)WIDTH::MANTISSA; }
inline u_t mantissa() const { return u & (u_t)MASK::MANTISSA; }
inline bool isNan() const {
return (mantissa() != 0) && ((u & ((u_t)MASK::EXPONENT)) == (u_t)MASK::EXPONENT);
}
};
template<typename T>
inline IEEE754<T> toIEEE754(T val) { return IEEE754<T>(val); }
And the replaceNaN
function now becomes:
void replaceNaN(float * arr, int size, float newValue){
for(int i = 0; i < size; i++)
if (toIEEE754(arr[i]).isNan()) arr[i] = newValue;
}
An inspection of the assembly of these functions reveals that, as expected, all masks become compile-time constants, leading to the following (seemingly) efficient code:
# In loop of replaceNaN
movl (%rcx), %eax # eax = arr[i]
testl $8388607, %eax # Check if mantissa is empty
je .L3 # If it is, it's not a nan (it's inf), continue loop
andl $2139095040, %eax # Mask leaves only exponent
cmpl $2139095040, %eax # Test if exponent is all 1s
jne .L3 # If it isn't, it's not a nan, so continue loop
This is one instruction less than with a working bit field solution (no shift), and the same number of registers are used (although it's tempting to say this alone makes it more efficient, there are other concerns such as pipelining which may make one solution more or less efficient than the other one).