In general, it's not that hard to accommodate unusual platforms for the most cases (if you don't want to simply assume 8-bit char
, 2's complement, no padding, no trap, and truncating unsigned-to-signed conversion), the standard mostly gives enough guarantees (a few macros to inspect certain implementation details would be helpful, though).
As far as a strictly conforming program can observe (outside bit-fields), 5 is always encoded as 00...0101
. This is not necessarily the physical representation (whatever this should mean), but what is observable by portable code. A machine using Gray code internally, for example, would have to emulate a "pure binary notation" for bitwise operators and shifts.
For negative values of signed types, different encodings are allowed, which leads to different (but well-defined for every case) results when re-interpreting as the corresponding unsigned type. For example, strictly conforming code must distinguish between (unsigned)n
and *(unsigned *)&n
for a signed integer n
: They are equal for two's complement without padding bits, but different for the other encodings if n
is negative.
Further, padding bits may exist, and signed integer types may have more padding bits than their corresponding unsigned counterparts (but not the other way round, type-punning from signed to unsigned is always valid). sizeof
cannot be used to get the number of non-padding bits, so e.g. to get an unsigned value where only the sign-bit (of the corresponding signed type) is set, something like this must be used:
#define TYPE_PUN(to, from, x) ( *(to *)&(from){(x)} )
unsigned sign_bit = TYPE_PUN(unsigned, int, INT_MIN) &
TYPE_PUN(unsigned, int, -1) & ~1u;
(there are probably nicer ways) instead of
unsigned sign_bit = 1u << sizeof sign_bit * CHAR_BIT - 1;
as this may shift by more than the width. (I don't know of a constant expression giving the width, but sign_bit
from above can be right-shifted until it's 0 to determine it, Gcc can constant-fold that.) Padding bits can be inspected by memcpy
ing into an unsigned char
array, though they may appear to "wobble": Reading the same padding bit twice may give different results.
If you want the bit pattern (without padding bits) of a signed integer (little endian):
int print_bits_u(unsigned n) {
for(; n; n>>=1) {
putchar(n&1 ? '1' : '0'); // n&1 never traps
}
return 0;
}
int print_bits(int n) {
return print_bits_u(*(unsigned *)&n & INT_MAX);
/* This masks padding bits if int has more of them than unsigned int.
* Note that INT_MAX is promoted to unsigned int here. */
}
int print_bits_2scomp(int n) {
return print_bits_u(n);
}
print_bits
gives different results for negative numbers depending on the representation used (it gives the raw bit pattern), print_bits_2scomp
gives the two's complement representation (possibly with a greater width than a signed int
has, if unsigned int
has less padding bits).
Care must be taken not to generate trap representations when using bitwise operators and when type-punning from unsigned to signed, see below how these can potentially be generated (as an example, *(int *)&sign_bit
can trap with two's complement, and -1 | 1
can trap with ones' complement).
Unsigned-to-signed integer conversion (if the converted value isn't representable in the target type) is always implementation-defined, I would expect non-2's complement machines to differ from the common definition more likely, though technically, it could also become an issue on 2's complement implementations.
From C11 (n1570) 6.2.6.2:
(1) For unsigned integer types other than unsigned char
, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
(2) For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char
shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed
type and N in the unsigned type, then M≤N ). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
- the corresponding value with sign bit 0 is negated (sign and magnitude);
- the sign bit has the value -(2M) (two's complement);
- the sign bit has the value -(2M-1) (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.