Behaviour of custom NaN floats in Python and Numpy

Question

I need to pack some extra information into floating point NaN values. I am using single-precision IEEE 754 floats (32-bit floats) in Python. How do Python and NumPy treat these values?

Theory

The IEEE 754-2008 standard seems to think a number is really not a number, if the exponent bits (23..30) are set, and at least one of the significand bits is set. Thus if we transform the float into a 32-bit integer representation, anything satisfying the following conditions goes:

i & 0x7f800000 == 0x7f800000
i & 0x007fffff != 0

This would leave me with plenty of choice. However, the standard seems to say that the highest bit of the significand is is_quiet and should be set to avoid exceptions in calculations.

Practical tests

Python 2.7

In order to be sure, I ran some tests with interesting results:

import math
import struct

std_nan = struct.unpack("f4", struct.pack("I", 0x7fc00000))[0]
spec_nan = struct.unpack("f4", struct.pack("I", 0x7f800001))[0]
spec2_nan = struct.unpack("f4", struct.pack("I", 0x7fc00001))[0]

print "{:08x}".format(struct.unpack("I", struct.pack("f4", std_nan))[0])
print "{:08x}".format(struct.unpack("I", struct.pack("f4", spec_nan))[0])
print "{:08x}".format(struct.unpack("I", struct.pack("f4", spec2_nan))[0])

This gives:

7fc00000
7fc00001 <<< should be 7f800001
7fc00001

This and some further testing seems to suggst that something (struct.unpack?) always sets the is_quiet bit.

NumPy

I tried the same with NumPy, because there I can always rely on conversions not changing a single bit:

import numpy as np

intarr = np.array([0x7f800001], dtype='uint32')
f = np.fromstring(intarr.tostring(), dtype='f4')
print np.isnan(f)

This gives:

RuntimeWarning: invalid value encountered in isnan
[True]

but if the value is replaced by 0x7fc00001, there is no error.

Hypothesis

Both Python and NumPy will be happy, if I set the is_quiet and use the rest of the bits for my own purposes. Python handles the bit by itself, NumPy relies on lower-level language implementations and/or the hardware FP implementation.

Question

Is my hypothesis correct, and can it be proved or disproved by some official documentation? Or is it one of those platform-dependent things?

I found something quite related here: How to distinguish different types of NaN float in Python, but I could not find any official word on how extra-information-carrying NaNs should be handled in Python or NumPy.

My first reaction - If your design requires you to hack an existing value/convention with your own implementation specific information while retain backwards compatibility, I would call "bad design" and implore you to go back to the drawing board and re-design it. There must be a better way of doing this. If there was a standard way of doing this it would be documented. — Tony Suffolk 66, Jul 07 '14 at 14:21
This might be helpful to see the behaviour of `struct.unpack`: [`_PyFloat_Unpack4` source](https://github.com/python-git/python/blob/715a6e5035bb21ac49382772076ec4c630d6e960/Objects/floatobject.c#L2351). — Pietro Saccardi, Jul 07 '14 at 15:47
@TonySuffolk66: I do agree with you for the most part. However, IEEE 754-2008 clause 6.2.1: "When encoded, all NaNs have a sign bit and a pattern of bits necessary to identify the encoding as a NaN and which determines its kind (sNaN vs. qNaN). The remaining bits, which are in the trailing significand field, encode the payload, which might be diagnostic information (see above)." So, I cannot find anything non-standard in what I am doing. (Plus archive file formats may be rather odd in cases. This is not going to be a part of a public API.) — DrV, Jul 07 '14 at 19:01
@PietroSaccardi: Thank you for reminding me of having a look at the source code! What I found there - and especially what I did not find there - solved this riddle! I'll post an answer to show what I think is really going on. — DrV, Jul 07 '14 at 19:03
I would still say - just because something is possible, doesn't mean it should be done. — Tony Suffolk 66, Jul 08 '14 at 16:56
The `view` method of numpy numbers and arrays can be useful. For example: `hex(np.float32(1.23456789).view(np.int32))` returns `0x3f9e0652`. You can modify the view (`np.int32` here) and this will be reflected in the `np.float32` value. (I know this is an old post, but this information may still be useful to others landing here still.) — equaeghe, Sep 07 '17 at 07:51

score 8 · Accepted Answer · answered Jul 07 '14 at 19:34

After thinking of this for some time and having a look at the source code ad then rethinking a bit, I think I can answer my own question. My hypotheses are almost correct but not the whole story.

As NumPy and Python handle numbers quite differently, this answer has two parts.

What really happens in Python and NumPy with NaNs

NumPy

This may be slightly platform-specific, but on most platforms NumPy uses the gcc builtin isnan, which in turn does something fast. The runtime warnings come from the deeper levels, from the hardware in most cases. (NumPy may use on of several methods of determining the NaN status, such as x != x, which works on at least AMD 64 platforms, but with gcc it is down to gcc, which probably uses some pretty short code for the purpose.)

So, in theory there is no way to guarantee how NumPy handles NaNs, but in practice on the more common platforms it will do as the standard says because that's what the hardware does. NumPy itself does not care about the NaN types at all. (Except for some NumPy-specific non-hw-supported data types and platforms.)

Python

Here the story becomes interesting. If the platform supports IEEE floats (most do), Python uses the C library for floating point arithmetics, and thus almost directly hardware instructions in most cases. So there should not be any difference to NumPy.

Except for... There is usually no such thing as a 32-bit float in Python. Python float objects use C double, which is a 64-bit format. How does one transform special NaNs between these formats? In order to see what happens in practice, the following little C code helps:

/* nantest.c - Test floating point nan behaviour with type casts */

#include <stdio.h>
#include <stdint.h>

static uint32_t u1 = 0x7fc00000;
static uint32_t u2 = 0x7f800001;
static uint32_t u3 = 0x7fc00001;

int main(void)
    {
    float f1, f2, f3;
    float f1p, f2p, f3p;
    double d1, d2, d3;
    uint32_t u1p, u2p, u3p;
    uint64_t l1, l2, l3;

    // Convert uint32 -> float
    f1 = *(float *)&u1; f2 = *(float *)&u2; f3 = *(float *)&u3;

    // Convert float -> double (type cast, real conversion)
    d1 = (double)f1; d2 = (double)f2; d3 = (double)f3;

    // Convert the doubles into long ints
    l1 = *(uint64_t *)&d1; l2 = *(uint64_t *)&d2; l3 = *(uint64_t *)&d3;

    // Convert the doubles back to floats
    f1p = (float)d1; f2p = (float)d2; f3p = (float)d3;

    // Convert the floats back to uints
    u1p = *(uint32_t *)&f1p; u2p = *(uint32_t *)&f2p; u3p = *(uint32_t *)&f3p;

    printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f1, u1, d1, l1, f1p, u1p);
    printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f2, u2, d2, l2, f2p, u2p);
    printf("%f (%08x) -> %lf (%016llx) -> %f (%08x)\n", f3, u3, d3, l3, f3p, u3p);

    return 0;
    }

This prints:

nan (7fc00000) -> nan (7ff8000000000000) -> nan (7fc00000)
nan (7f800001) -> nan (7ff8000020000000) -> nan (7fc00001)
nan (7fc00001) -> nan (7ff8000020000000) -> nan (7fc00001)

By looking at row 2 it is obvious that we have the same phenomenon as we had with Python. So, it is the conversion to double that introduces the extra is_quiet bit immediately after the exponent in the 64-bit version.

This sounds a bit strange, but actually the standard says (IEEE 754-2008, section 6.2.3):

Conversion of a quiet NaN from a narrower format to a wider format in the same radix, and then back to the same narrower format, should not change the quiet NaN payload in any way except to make it canonical.

This does not say anything about the propagation of signaled NaN's. However, that is explained by section 6.2.1.:

For binary formats, the payload is encoded in the p − 2 least significant bits of the trailing significand field.

The p above is precision, 24 bits for a 32-bit float. So, my mistake was to use signaled NaNs for payload.

Summary

I got the following take home points:

the use of qNaNs (quiet NaNs) is supported and encouraged by the IEEE 754-2008
odd results were because I tried to use sNaNs and type conversions resulted in the is_quiet bit being set
both NumPy and Python act according to IEEE 754 on the most common platforms
the implementation leans heavily on the underlying C implementation and thus guarantees very little (there is even some code in Python which acknowledges that NaNs are not handled as they should be on some platforms)
the only safe way to handle this is to do a bit of DIY with the payload

There is however, one thing that is implemented in neither Python nor NumPy (nor any other language I have come across with). Section 5.12.1:

Language standards should provide an optional conversion of NaNs in a supported format to external character sequences which appends to the basic NaN character sequences a suffix that can represent the NaN payload (see 6.2). The form and interpretation of the payload suffix is language-defined. The language standard shall require that any such optional output sequences be accepted as input in conversion of external character sequences to supported formats.

Behaviour of custom NaN floats in Python and Numpy

1 Answers1