How to intrepret paragraph 1 of section 6.3.1.4 of C11 standard (about converting float to unsigned int)

Question

My C11 standard is from here. This paragraph says:

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.[61]

and footnote 61 says:

The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type. Thus, the range of portable real floating values is (−1, U type _MAX+1)

My confusion is mainly about unsigned int. My current understanding is the following:

float    a = 3.14;
uint32_t b = (uint32_t)a; // defined, b == 3

float    a = -1.23;
uint32_t b = (uint32_t)a; // UB!

float a = 2147483646.0;   // defined
uint32_t b = (uint32_t)a; // defined, b == 2147483646
uint8_t  c = (uint8_t )a;  // UB!

Is this correct?

OK, I'm still not sure what your point of confusion is. Why do you think that trying to put `2147483646` into a `uint8_t` is anything other than UB? — Adrian Mole, Feb 27 '23 at 10:10
`uint8_t a = (uint8_t)2147483646;` is well defined I think. unsigned integer does not overflow. — D.J. Elkind, Feb 27 '23 at 10:11
But you aren't casting one unsigned type to another; you are casting a floating-point type to an unsigned integer type. The rules are different. — Adrian Mole, Feb 27 '23 at 10:26
Re: "need not be performed": I had exactly [the same question](https://stackoverflow.com/a/62682671/1778275) (see comment from Jul 1, 2020 at 19:13). — pmor, Feb 27 '23 at 13:34

nielsen · Accepted Answer · 2023-02-27T10:15:54.360

1

Footnote 61 clarifies the range of floating-point number that can be casted to an unsigned integer type without undefined behavior.

The unsigned integer type can represent value in the range [0; Utype_MAX]. Hence any floating-point value with integer part in this interval can be casted to that unsigned integer type which means values x where x > -1 and x < Utype_MAX+1. This is the statement of the last part of footnote 61.

The general rule is that when operations on unsigned integers result in a number outside the range [0; Utype_MAX], then the result is reduced module Utype_MAX+1 (also referred to as "wrap-around"). E.g. when adding two 16-bit integers, 40000+40000=80000 which is not representable in 16 bit, the result is reduced module 65536 to 14464.

However, this wrap-around does not need to be done when casting a floating-point number to an unsigned integer. This is the first statement in footnote 61.

edited Feb 27 '23 at 10:15

answered Feb 27 '23 at 10:05

nielsen

5,641
10
27

oh this explains it. But how about the first part? That is, is my understanding correct? `float a = -1.23; uint32_t b = (uint32_t)a; // UB!` `float a = 2147483646.0; uint8_t c = (uint8_t )a; // UB!` – D.J. Elkind Feb 27 '23 at 10:07
Reading the statement one more time, on `the range of portable real floating values is (−1, U type _MAX+1)`, does it mean that `-0.5` is a portable real floating value for `float`? – D.J. Elkind Feb 27 '23 at 10:12
@D.J.Elkind (1) I have tried to update with an explanation of the first part. (2) Yes, because the integer part of -0.5 is 0 which is representable by any unsigned integer type. The same goes for `-0.99`, but not `-1.00`. – nielsen Feb 27 '23 at 10:17
1

sorry @nielsen, perhaps I did not make myself clear. My question is exactly this. Say `uint8_t a = (uint8_t)123456;` is defined given the warpping around, `uint8_t a = (uint8_t)123456.7` is UB as C standard does not require the wrapping. Is this what the standard says? – D.J. Elkind Feb 27 '23 at 10:18
thanks I think you fully understand my confusions and answered both of them. – D.J. Elkind Feb 27 '23 at 10:19
@D.J.Elkind: yes, the language of the Standard seems unambiguous about that and the footnote confirms that the modulo operation that is defined for integer conversions does not necessarily occur for floating point conversions. – chqrlie Feb 27 '23 at 10:28
@chqrlie: If one had an implementation that trapped on out-of-bounds float-to-unsigned conversions, having it trap likewise when converting a float value to any unsigned type which could represent its integer portion could be useful, for the same purpose of preventing code from misinterpreting the results of erroneous computations as valid data. – supercat Feb 27 '23 at 20:35

chqrlie · Answer 2 · 2023-02-27T11:11:30.463

Your question is exactly this:

Say uint8_t a = (uint8_t)123456; is defined given the wrapping around, uint8_t a = (uint8_t)123456.7 is UB as C standard does not require the wrapping. Is this what the standard says?

The language of the Standard seems unambiguous about that and the footnote confirms that the modulo operation that is defined for integer conversions does not necessarily occur for floating point conversions.

This text was already present in the C99 version of the C Standard (with a different footnote number), and also in the C90 version (aka ANSI C) without a reference to the _Bool type.

The reason for this apparent semantic inconsistency in the C Standard is probably the concern to keep existing implementations and hardware behavior compatible with the Standard. It may be linked to the binary representation of negative floating point numbers: while all but some ancient architectures have used two's complement representation for signed integers for a long time (this is actually mandated by the latest C23 Standard), floating point numbers generally use sign + magnitude representations. The modulo semantics of signed integer to unsigned integer conversions costs nothing on two's complement representations, but would require extra silicon for floating point values, which was not present on all current hardware implementations at the time. The Standard Committee decided to keep these cases undefined for uint32_t = (uint32_t)-1.23; and also for the less problematic uint8_t a = (uint8_t)123456.7; to avoid the requirement for compiler writers to produce extra costly code to fix the behavior on hardware that does not implement the modulo semantics already.

Note that the C23 has a slightly different spcification for the conversion from floating point to integer types:

6.3.1.4 Real floating and integer

1 When a finite value of standard floating type is converted to an integer type other than bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.⁶⁶⁾

2 When a finite value of decimal floating type is converted to an integer type other than bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the "invalid" floating-point exception shall be raised and the result of the conversion is unspecified.

Footnote: ⁶⁶⁾ The remaindering operation performed when a value of integer type is converted to unsigned type need not be performed when a value of real floating type is converted to unsigned type. Thus, the range of portable real floating values is (−1, Utype_MAX + 1).

The behavior is more explicit for conversions from decimal floating point representations to integer: a floating point exception must be raised if the value is not representable in the target type, which seems a very strong constraint as there are at least 8 and possibly more integral types to handle specifically, not counting the bit-precise integer types...

The insight is very helpful! One thing I am not very clear though--regarding the "exception" in "the "invalid" floating-point exception shall be raised and the result of the conversion is unspecified.", what exactly does it mean? I checked C23 briefly, it doesn't seem to add the exception support (as we know it in C++/C#/etc) to C. — D.J. Elkind, Feb 28 '23 at 02:28
@D.J.Elkind: indeed the C23 draft is adamant about raising floating point exceptions in various places, but I could not find a formal definition of what that means and how to catch them. — chqrlie, Feb 28 '23 at 10:11
@chqrlie: The Standard's abstraction model is incapable of accommodating optimizations that might cause a program to behave in a manner inconsistent with sequential program execution, except by categorizing as UB all situations where that might occur. Since achieving good floating-point performance often requires the ability to perform actions out of order, I don't see any good way for the Standard to say much useful about floating-point traps. — supercat, Feb 28 '23 at 20:28

Adrian Mole · Answer 3 · 2023-02-27T10:29:58.513

The range specifier, (−1, U_type_MAX+1) is exclusive (further reading). That is to say, the specified endpoints are not part of the range itself. So, that means that the inclusive range for a floating-point number that can be represented by the given unsigned type has, as its lower-bound, the floating-point number that is the next after -1 towards zero (which will be something like -0.999999940395 for an IEEE-754 float). Similarly, the upper-bound will be the next lower representable value before U_type_MAX+1 (which will be truncated to U_type_MAX).

Looking at your examples:

3.14 will be truncated to 3 – which is clearly representable as a uint32_t.
-1.23 will be truncated to -1 – which is not representable by any unsigned type, so that conversion is undefined behaviour.
The maximum representable value of a uint32_t is 4294967295, so your trial value of 2147483646 is perfectly-well defined for conversion to that type; however, the maximum value for a uint8_t is 255, so conversion to that type is undefined behaviour.

To add another example, conversion from -0.999999940395 to uint32_t will be well-defined because that value will first be truncated, yielding zero, which is representable by any unsigned type.

score 0 · Answer 4 · answered Feb 27 '23 at 19:43

The Standard imposes no requirements on what implementations do when converting an out-of-range floating-point value to unsigned int. For some purposes, it may be most useful for implementations to "peg" to UINT_MAX, for some it may be most useful for implementations to use wraparound semantics, and for some it may be most useful to trigger a trap that raises a signal, terminates the program, or otherwise acts to prevent the results from invalid computations from being mistaken for valid data.

If an implementation processes conversions to unsigned with wraparound semantics, it would probably be most useful for it to process conversions to smaller unsigned sizes likewise. If it traps such conversions with unsigned, however, having it trap out-of-range conversions to smaller values would likely be more useful than using wrap-around semantics for values within range of unsigned int but trapping semantics outside that range. The Standard gives implementations the freedom to behave in whichever way is more useful, on the presumption that implementations wouldn't use such freedom to process out-of-range conversions to smaller types in a way that's gratuitously more weird than conversions to larger types.

How to intrepret paragraph 1 of section 6.3.1.4 of C11 standard (about converting float to unsigned int)

4 Answers4