3

I am looking to check if a double value can be represented as an int (or the same for any pair of floating point an integer types). This is a simple way to do it:

double x = ...;
int i = x; // potentially undefined behaviour

if ((double) i != x) {
    // not representable
}

However, it invokes undefined behaviour on the marked line, and triggers UBSan (which some will complain about).

Questions:

  • Is this method considered acceptable in general?
  • Is there a reasonably simple way to do it without invoking undefined behaviour?

Clarifications, as requested:

The situation I am facing right now involves conversion from double to various integer types (int, long, long long) in C. However, I have encountered similar situations before, thus I am interested in answers both for float -> integer and integer -> float conversions.

Examples of how the conversion may fail:

  • Float -> integer conversion may fail is the value is not a whole number, e.g. 3.5.
  • The source value may be out of the range of the target type (larger or small than max and min representable values). For example 1.23e100.
  • The source values may be +-Inf or NaN, NaN being tricky as any comparison with it returns false.
  • Integer -> float conversion may fail when the float type does not have enough precision. For example, typical double have 52 binary digits compared to 63 digits in a 64-bit integer type. For example, on a typical 64-bit system, (long) (double) ((1L << 53) + 1L).
  • I do understand that 1L << 53 (as opposed to (1L << 53) + 1) is technically exactly representable as a double, and that the code I proposed would accept this conversion, even though it probably shouldn't be allowed.
  • Anything I didn't think of?
Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • 1
    Could you [edit] and add some example of values and expected outcome with a short explanation, it seems it isn't too clear what your actual question is. – Jabberwocky Apr 01 '22 at 11:16
  • 1
    Also, do you want a C or a C++ solution? They could be *very* different (e.g. using `typeid`). – Adrian Mole Apr 01 '22 at 11:24
  • 2
    Do not tag C and C++ except when asking about differences or interactions between the languages. Pick one language and delete the other tag. – Eric Postpischil Apr 01 '22 at 11:44
  • Doing this test completely (avoiding implementation dependencies and other issues) is tricky. A C++ solution is covered [here](https://stackoverflow.com/a/51323959/298225). That question is nominally for detecting overflow in the conversion, but, once that is dealt with, completing the test for whether the conversion is exact is easier. – Eric Postpischil Apr 01 '22 at 11:46
  • @Jabberwocky I added some clarifying examples. Let me know if this is not sufficient. – Szabolcs Apr 01 '22 at 12:21
  • @EricPostpischil Deleted `c++`. – Szabolcs Apr 01 '22 at 12:22
  • @EricPostpischil However, I could re-post the exact same question for C++ and I am interested in _both_ C and C++ solutions. – Szabolcs Apr 01 '22 at 12:22
  • 1
    @Szabolcs: Yes, you could post the same question for C++, and that is the preferred method on Stack Overflow. That way, people seeking C answers can search the C questions and will not be distracted by inapplicable C++ answers, people seeking C++ answers can search the C++ questions and will not be distracted by the inapplicable C answers, and people searching for either can search both. Further, any answers with information common to both languages can cross-reference other answers. Do not think about how Stack Overflow can serve you; think about how it serves people generally. – Eric Postpischil Apr 01 '22 at 13:24

3 Answers3

1

Create range limits exactly as FP types

The "trick" is to form the limits without loosing precision.

Let us consider float to int.

Conversion of float to int is valid (for example with 32-bit 2's complement int) for -2,147,483,648.9999... to 2,147,483,647.9999... or nearly INT_MIN -1 to INT_MAX + 1.

We can take advantage that integer_MAX is always a power-of-2 - 1 and integer_MIN is -(power-of-2) (for common 2's complement).

Avoid the limit of FP_INT_MIN_minus_1 as it may/may not be exactly encodable as a FP.

// Form FP limits of "INT_MAX plus 1" and "INT_MIN"
#define FLOAT_INT_MAX_P1 ((INT_MAX/2 + 1)*2.0f)
#define FLOAT_INT_MIN ((float) INT_MIN)

if (f < FLOAT_INT_MAX_P1 && f - FLOAT_INT_MIN > -1.0f) {
  // Within range.
  
  Use modff() to detect a fraction if desired.
}

More pedantic code would use !isnan(f) and consider non-2's complement encoding.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

Using known limits and floating-point number validity. Check what's inside limits.h header.

You can write something like this:

#include <limits.h>
#include <math.h> 

// Of course, constants used are specific to "int" type... There is others for other types.
if ((isnormal(x)) && (x>=INT_MIN) && (x<=INT_MAX) && (round(x)==x))
  // Safe assignation from double to int.
  i = (int)x ;
else
  // Handle error/overflow here.
  ERROR(.....) ;

Code relies on lazy boolean evaluation, obviously.

Wisblade
  • 1,483
  • 4
  • 13
  • 2
    That's not at all what the OP is asking. It's not about limits but about the fact that a fp number can be represented _exactly_ with an `int` or not. – Jabberwocky Apr 01 '22 at 11:09
  • Didn't understood it this way.... I may update my answer according to his precisions, still need to be within `int` limits to avoid strange things with overflows. – Wisblade Apr 01 '22 at 11:12
  • Not sure what OP *is* asking about, though. 1.000001 will fail their test but 1.0000000000000000000000001 probably won't. – Adrian Mole Apr 01 '22 at 11:12
  • @AdrianMole Probably, as soon as you start to get this kind of floating-point numbers, you're lucky to keep these least-significant digits... – Wisblade Apr 01 '22 at 11:18
  • 1
    Since UB is related to the integer part out of range, this is a valuable answer. In case of c++ check also `std::numeric_limits` – MatG Apr 01 '22 at 11:36
  • I meant the question exactly as written, "I am looking to check if a double value can be represented as an int". I am not looking to handle _only_ special cases, such as _only_ overflow or only rounding. I am looking to detect and handle all possible conversion failures. – Szabolcs Apr 01 '22 at 12:24
  • @AdrianMole `1.0000000000000000000000001` is not different from `1.0` on a typical machine with 52 binary digits of precision. I'm not sure why people are reading unsaid things into the question. It is meant exactly as written. – Szabolcs Apr 01 '22 at 12:25
  • @Szabolcs It wasn't so clear, if so many people ask for precision... BTW, I've updated my answer according to these precisions. – Wisblade Apr 01 '22 at 12:28
  • @Jabberwocky: In cases where a cast to `int` will yield defined behavior, a cast followed by a comparison, as suggested in the original question, will reveal whether the value was precisely representable. The only difficult part is determining when that test may be safely employed, which this answer does cover. – supercat Apr 01 '22 at 18:05
-2

Please refer to IEEE 754 representation of floating point numbers in Memory https://en.wikipedia.org/wiki/IEEE_754

Take double as an example:

  • Sign bit: 1 bit
  • Exponent: 11 bits
  • Fraction: 52 bits

There are three special values to point out here:

  1. If the exponent is 0 and the fractional part of the mantissa is 0, the number is ±0
  2. If the exponent is 2047 and the fractional part of the mantissa is 0, the number is ±∞
  3. If the exponent is 2047 and the fractional part of the mantissa is non-zero, the number is NaN.

This is an example of convert from double to int on 64-bit, just for reference

#include <stdint.h>

#define EXPBITS      11
#define FRACTIONBITS 52
#define GENMASK(n)   (((uint64_t)1 << (n)) - 1)
#define EXPBIAS      GENMASK(EXPBITS - 1) 
#define SIGNMASK     (~GENMASK(FRACTIONBITS + EXPBITS)) 
#define EXPMASK      (GENMASK(EXPBITS) << FRACTIONBITS) 
#define FRACTIONMASK GENMASK(FRACTIONBITS) 

int double_to_int(double src, int *dst)
{
    union {
        double d;
        uint64_t i;
    } y;

    int exp;
    int sign;
    int maxbits;
    uint64_t fraction;

    y.d = src;
    sign = (y.i & SIGNMASK) ? 1 : 0;
    exp = (y.i & EXPMASK) >> FRACTIONBITS;
    fraction = (y.i & FRACTIONMASK);

    // 0
    if (fraction == 0 && exp == 0) {
        *dst = 0;
        return 0;
    }

    exp -= EXPBIAS;
    // not a whole number
    if (exp < 0)
        return -1;

    // out of the range of int
    maxbits = sizeof(*dst) * 8 - 1;
    if (exp >= maxbits && !(exp == maxbits && sign && fraction == 0))
        return -2;

    // not a whole number
    if (fraction & GENMASK(FRACTIONBITS - exp))
        return -3;

    // convert to int
    *dst = src;

    return 0;
}
dulngix
  • 424
  • 1
  • 5
  • 3
    (a) A C implementation does not necessarily use an IEEE-754 format. (b) The C standard does not guarantee bit-fields are contiguous nor in any particular order. (c) Accessing the `double` parameter `x` as a `struct ieee754_double` violates the aliasing rules in C 2018 6.5 7. (d) The C standard does not guarantee `int` is 32 bits. (e) Even if it is, and is two’s complement, this code incorrectly reports −2,147,483,648 is not representable in `int`. (f) The C standard does not guarantee `long` has enough bits that `1L << (52-exp)` will work. – Eric Postpischil Apr 02 '22 at 00:14
  • 2
    If we did want to hard-coded based on expectations of `int` and `double` properties, then the task is simple. Test for NaN, value greater than 2^31−1, and value less than −2^31. To test for a non-integer, use the standard library `remainder`. If the number passes the tests, the conversion is easily performed via `*result = x;`. There is no need for any code to manipulate the floating-point encoding. – Eric Postpischil Apr 02 '22 at 00:18