Can a C compiler change bit representation when casting signed to unsigned?

Question

Is it possible for an explicit cast of, say, int32_t to uint32_t, to alter the bit representation of the value?

For example, given that I have the following union:

typedef union {
    int32_t signed_val;
    uint32_t unsigned_val;
} signed_unsigned_t;

Are these code segments guaranteed by the spec to have the same behaviour?

uint32_t reinterpret_signed_as_unsigned(int32_t input) {
    return (uint32_t) input;
}

and

uint32_t reinterpret_signed_as_unsigned(int32_t input) {
    signed_unsigned_t converter;
    converter.signed_val = input;
    return converter.unsigned_val;
}

I'm considering C99 here. I've seen a few similar questions, but they all seemed to be discussing C++, not C.

I suspect there are some hypothetical cases where there might be a difference, such as operating on a machine that does 1's complement arithmetic. — Hot Licks, Sep 21 '13 at 02:16

rici · Accepted Answer · 2016-06-26T19:38:25.307

8

Casting a signed integer type to an unsigned integer type of the same width can change the representation, if you can find a machine with sign-magnitude or ones-complement signed representations. But the types int32_t and uint32_t are guaranteed to be two's-complement representations, so in that particular case the representation cannot change.

Conversion of signed integer to unsigned integers is well-defined in the standard, section 6.3.1.3. The relevant algorithm is the second paragraph:

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

...

So the result has to be, in effect, what a bit-for-bit copy would have resulted in had the negative number been stored in 2's-complement. A conforming implementation is allowed to use sign-magnitude or ones-complement; in both cases, the representation of negative integers will have to be modified to cast to unsigned.

Summarizing a lengthy and interesting discussion in the comments:

In the precise example in the OP, which uses int32_t and uint32_t, the representations must be equal if the program compiles, because C99 requires int32_t and uint32_t to be exactly 32 bits long with no padding, and requires int32_t to use 2's-complement representation. It does not, however, require those types to exist; a ones-complement implementation could simply not define int32_t, and still conform.
My interpretation of type-punning is below the horizontal rule. @R.. pointed us to a Defect Report from 2004 which seems to say that type-punning is either OK or fires a trap, which is closer to implementation-defined behaviour than undefined behaviour. On the other hand, the suggested resolution of that DR doesn't seem to be in the C11 document, which says (6.2.6.1(5)):

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.

That seems to me to be saying that type-punning is undefined behaviour if one of the participating types has a trap representation (and consequently is not undefined behaviour if the reading type does not have a trap representation). On the other hand, no type is required to have a trap representation, and only a few types are prohibited from having one: char and union types -- but not members of union types --, as well as whichever of the [u]int*K_t types are implemented.

My previous statement on type-punning follows:

The storage-punning union has undefined behaviour. But without invoking lagartos voladores, it is somewhat expected that sign-magnitude or ones-complement machines may throw a hardware exception if a certain value is stored as unsigned and then accessed as signed.

Both ones-complement and sign-magnitude have two possible representations of 0, one with each popular sign bit. The one with a negative sign bit, "negative zero", is allowed to be a "trap value"; consequently, accessing the value (even just to copy it) as a signed integer could trigger the trap.

Although the C compiler would be within its rights to suppress the trap, say by copying the value with memcpy or an unsigned opcode, it is unlikely to do so because that would be surprising to a programmer who knew that her program was running on a machine with trapping negative zeros, and was expecting the trap to trigger in the case of an illegal value.

edited Jun 26 '16 at 19:38

answered Sep 21 '13 at 02:16

rici

234,347
28
237
341

Does this mean that on a machine where signed integer representation is 2's-complement these are guaranteed to behave the same? Or is this bordering undefined behavior and eventually giving me a bad case of nasal demons? – Alexandre Araujo Moreira Sep 21 '13 at 02:29
1

@AlexandreAraujoMoreira: There is one exception: it is *not* guaranteed that unsigned int has the same number of bits as signed int. It might have more bits. (I don't know of any such architecture either, but afaik it is theoretically possible.) In that case, the representations are sort of not the same. Anyway, the union is definitely UB. But lots of people do it. – rici Sep 21 '13 at 02:36
Is it possible to have different number of bits even with the new `intN_t` and `uintN_t` types? I thought they were guaranteed to have N bits, but the more I see the more I believe I'm being very naive. – Alexandre Araujo Moreira Sep 21 '13 at 02:48
@rici: The union is not supposed to be UB anymore (there's a DR that was resolved by defining the behavior), but the changes don't seem to have made it into the published standard... – R.. GitHub STOP HELPING ICE Sep 21 '13 at 02:50
@AlexandreAraujoMoreira: They're guaranteed to have *at least* N bits. – rici Sep 21 '13 at 02:50
1

@rici: No, the `intN_t` and `uintN_t` are guaranteed to have *exactly* N bits with no padding, and the signed ones are guaranteed to be twos complement and have the full range (i.e. `-INTN_MAX-1` is a representable value). – R.. GitHub STOP HELPING ICE Sep 21 '13 at 02:51
R..: yeah, I mis-stated that. The union is fine, but you can only access a union member if the last value stored in the union was stored in that member. – rici Sep 21 '13 at 02:51
The DR resolution was that you can access non-last-stored types under certain conditions. – R.. GitHub STOP HELPING ICE Sep 21 '13 at 02:52
R..: do you have a reference? (And you're right about int32_t, but then that's posix, not standard c, no?) – rici Sep 21 '13 at 02:54
POSIX requires `intN_t` to exist for N=8,16,32,64. Plain C just requires that, *if they exist*, they have the properties I described. – R.. GitHub STOP HELPING ICE Sep 21 '13 at 02:55
@R..: acrobat search for int32 in n1570.pdf comes up empty. Do you have a reference for that, also? ... never mind, found it. 7.20 – rici Sep 21 '13 at 02:57
References: DR283 (type punning) at http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm and `intN_t` at N1570 (C11) 7.20.1.1 "Exact-width integer types" and 7.20.2.1 "Limits of exact-width integer types". – R.. GitHub STOP HELPING ICE Sep 21 '13 at 02:59
@R..: So, you can type pun but if a trap fires, you get to pick up the pieces. At least, that's my interpretation of that DR. Less nasally catastrophic than UB, to be sure. But I don't see it anywhere in n1570.pdf; 6.2.6.1(5) says "Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined." That's a less precise UB warning than usual, since it's contingent on the existence of trap representations. – rici Sep 21 '13 at 03:09
Like I said, despite being approved, the DR somehow missed getting included into the text of the standard, it seems. Note that `uintN_t` and `intN_t` types **cannot** have trap representations since they do not have padding bits and are required to represent the full range (no exclusion of most-negative value). – R.. GitHub STOP HELPING ICE Sep 21 '13 at 03:27
@R..: yeah, I got that. So if you access 32bits of a union with int32_t, there cannot be a trap and the access is going to give you 32 bits. If you use an int, the behaviour might or might not be undefined, which seems wierd to me. But it's not the same as the DR, was my point: the DR says either you get the bits or a trap fires; the standard says that if a trap can fire, it's UB. – rici Sep 21 '13 at 03:29
No, the DR says that the bits are reinterpreted as the other type, which *might* be a trap representation. However, if the type in question does not have trap representations (which it cannot have in this case, since it has exactly 2^N values and N bits) then there is no possibility of trap. – R.. GitHub STOP HELPING ICE Sep 21 '13 at 04:24

Keith Thompson · Answer 2 · 2016-06-27T20:32:53.817

In the particular case you mention, a conversion from int32_t to uint32_t, the bit representation will be the same.

The standard specifically requires intN_t to be "a signed integer type with width N , no padding bits, and a two’s complement representation". Furthermore, corresponding signed and unsigned types must have the same representation for values within their shared range:

A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value.

There is one very small possible loophole: in principle, an implementation could, for example, make int32_t a typedef for int, and uint32_t a typedef for unsigned long, where intandlong are both 32 bits but have different byte orders. But that would only happen in a deliberately perverse implementation. Correction: This is not possible for a conforming implementation. int32_t and uint32_t must denote corresponding signed and unsigned types.

The above applies only because you happened to choose int32_t and uint32_t for your example, and the standard places very specific restrictions on their representation. (And if an implementation can't meet those restrictions, then it simply won't define int32_t or uint32_t.)

More generally, though, signed types are permitted to have one of three representations:

sign and magnitude, where setting the sign bit to 1 negates a number;
two's complement, where negation is equivalent to a bitwise complement followed by adding 1; and
one's complement, where negation is equivalent to a bitwise complement.

The vast majority of modern systems use two's complement (and have no padding bits). On such systems, signed-to-unsigned conversion with types of the same size generally does not change the bit representation. (The semantics of type conversions are defined in terms of values, but are designed to be convenient for two's complement systems.)

But for a system that uses either sign and magnitude or one's complement, signed-to-unsigned conversion must preserve the value, which means that conversion of a negative value must change the representation.

For an implementation to make uint32_t and int32_t to be "unsigned" and "long", respectively, could cause problems even if the bit ordering were the same, since "unsigned" and "int" are alias-compatible, as are "long" and "unsigned long", but some compilers treat "int" and "long" as incompatible even when they are the same size and have the same representation. — supercat, Jun 27 '16 at 20:11
@supercat: No need for that analysis; it would simply be non-conforming. N1570 7.20.1p1: "When typedef names differing only in the absence or presence of the initial **`u`** are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5; an implementation providing one of these corresponding types shall also provide the other." I'll correct my answer. BTW, rather than "some compilers treat `int` and `long` as incompatible", it's more accurate to say that `int` and `long` *are* incompatible. See the definition of "compatible type" in the standard. — Keith Thompson, Jun 27 '16 at 20:31
Any idea what motivated a requirement that an implementation that defines uint32_t must also define int32_t? I would think it would be helpful for a 32-bit ones'-complement machine to define uint32_t whether or not it can define int32_t, and I can't see any advantage to prohibiting such an implementation from doing so. Do you see any useful purpose served by that rule? — supercat, Jun 27 '16 at 20:38
No, I don't. On the vast majority of systems, it doesn't matter, but I agree that on (rare) non-2's-complement systems it would make more sense to define `uint32_t` and not `int32_t`. The C99 Rationale doesn't address the issue. — Keith Thompson, Jun 27 '16 at 20:52

score 2 · Answer 3 · answered Sep 21 '13 at 02:21

If the value is in the range of both the signed and the unsigned types, then both the value and representation are unchanged by conversions.

Otherwise, the signed-to-unsigned conversion is only allowed to preserve the bit representation when the implementation's representation of negative values for the type is twos-complement. For ones complement or sign-magnitude, it conversion must change the representation. The conversion in the other direction is implementation-defined, so it may or may not change the representation.

Can a C compiler change bit representation when casting signed to unsigned?

3 Answers3

Linked