What do the C and C++ standards say about bit-level integer representation and manipulation?

Question

I know the C and C++ standards don't dictate a particular representation for numbers (could be two's complement, sign-and-magnitude, etc.). But I don't know the standards well enough (and couldn't find if it's stated) to know if there are any particular restrictions/guarantees/reserved representations made when working with bits. Particularly:

If all the bits in an integer type are zero, does the integer as whole represent zero?
If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)
Is there a guaranteed way to check if any bit is not set?
Is there a guaranteed way to check if any bit is set? (#3 and #4 kind of depend on #1 and #2, because I know how to set, for example the 5th bit (see #5) in some variable x, and I'd like to check a variable y to see if it's 5th bit is 1, I would like to know if if (x & y) will work (because as I understand, this relies on the value of the representation and not whether nor not that bit is actually 1 or 0))
Is there a guaranteed way to set the left-most and/or right-most bits? (At least a simpler way than taking a char c with all bits true (set by c = c | ~c) and doing c = c << (CHAR_BIT - 1) for setting the high-bit and c = c ^ (c << 1) for the low-bit, assuming I'm not making any assumptions I should't be, given these questions)
If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?

I guess my overall question is: are there any restrictions/guarantees/reserved representations made by the C and C++ standards regarding bits and integers, despite the fact that an integer's representation is not mandated (and if the C and C++ standards differ in this regard, what's their difference)?

I came up with these questions while doing my homework which required me to do some bit manipulating (note these aren't questions from my homework, these are much more "abstract").

Edit: As to what I refer to as "bits," I mean "value forming" bits and am not including "padding" bits.

This question covers #1, apparently the answer is yes: http://stackoverflow.com/questions/11138188/set-all-bytes-of-int-to-unsigned-char0-guaranteed-to-represent-zero — John Carter, Aug 25 '12 at 21:11
@Cornstalks - two other good links: [What do the C and C++ standards say about bit-level integers](stackoverflow.com/questions/12125650/what-do-the-c-and-c-standards-say-about-bit-level-integer-representation-and-m/), http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf — paulsm4, Aug 26 '12 at 01:33
@paulsm4: did you just link to my own question? And I've looked at the C and C++ standards. If you have a particular section you want to point to, great, but if I knew where the answer was in the standards (if it's even there) I wouldn't have had to ask. — Cornstalks, Aug 26 '12 at 16:06
"As to what I refer to as "bits," I mean "value forming" bits and am not including "padding" bits." - what about the _sign_ bit? — davmac, Feb 04 '15 at 13:12

score 16 · Accepted Answer · answered Aug 25 '12 at 21:35

(1) If all the bits in an integer type are zero, does the integer as whole represent zero?

Yes, the bit pattern consisting of all zeroes always represents 0:

The representations of integral types shall define values by use of a pure binary numeration system.⁴⁹ [§3.9.1/7]

⁴⁹ A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position.

(2) If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)

No. In fact, signed magnitude is specifically allowed:

[ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end example ] [§3.9.1/7]

(3) Is there a guaranteed way to check if any bit is not set?

I believe the answer to this is "no," if you consider signed types. It is equivalent to equality testing with a bit pattern of all ones, which is only possible if you have a way to produce a signed number with bit pattern of all ones. For an unsigned number this representation is guaranteed, but casting from unsigned to signed is undefined if the number is unrepresentable:

If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined. [§4.7/3]

(4) Is there a guaranteed way to check if any bit is set?

I don't think so, because signed magnitude is allowed—0 would compare equal to −0. But it should be possible with unsigned numbers.

(5) Is there a guaranteed way to set the left-most and/or right-most bits?

Again, I believe the answer is "yes" for unsigned numbers, but "no" for signed numbers. Shifts are undefined for negative signed numbers:

Otherwise, if E1 has a signed type and non-negative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. [§5.8/2]

Regarding #3, can't you make a bit pattern of all 1s using (even for signed types) `c = c | ~c`? And for #5, are you talking about right shifts for negative signed numbers? Or is it also undefined for left shifts of negative signed numbers? (if that's the case then the code sample I wrote for #5 isn't guaranteed to work, I guess) — Cornstalks, Aug 28 '12 at 01:20
For #3: I believe `~c` is the one's complement, not "flip all bits." So it's an interesting semantic question. What's the one's complement of 0 when represented in signed-magnitude? I would assume it's still -1, so only two bits set. For #5: undefined both ways. — John Calsbeek, Aug 28 '12 at 03:28
6.5.3.3.4 describes `~c` as "flip all bits" ("each bit in the result is set if and only if the corresponding bit in the converted operand is not set"), so I think `c = c | ~c` should work. Good call on shifting: turns out right shift is implementation defined for negative signed integers, and left shift has undefined behavior if it's a negative signed integer. — Cornstalks, Aug 28 '12 at 04:20
I'm reading C++ N3337 §5.3.1/10, which uses the "one's complement" wording. I suspect reading it as being bitwise is the only valid interpretation. That seems like it might need clarification—I suspect there's a lot of code which relies on `-n = ~n + 1`. — John Calsbeek, Aug 28 '12 at 04:39
Ah, I was quoting the C standard. I do believe, though, that it properly inverts all the bits, and that yes, it's the bitwise operation "one's complement" that was intended here (and that the result of `~0` is not necessarily -1). I agree there's probably a lot of code that relies on `-n = ~n + 1` since that's a common identity used in operating on two's complement numbers (at least from the code I've seen). — Cornstalks, Aug 28 '12 at 04:49
I'm going to go ahead and accept this as answer, but I also want to note that AndreyT and Jens Gustedt played notable roles. — Cornstalks, Aug 30 '12 at 22:08
"Yes, the bit pattern consisting of all zeroes always represents 0" -- That doesn't follow from the cited text. As of C99, if an integer type has padding bits, it's possible that all-bits-zero is a trap representation. C11 added an explicit requirement: "For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type." — Keith Thompson, Feb 18 '15 at 19:53

AnT stands with Russia · Answer 2 · 2012-08-25T21:48:08.120

You use the term "all bits" repeatedly, but you do not clarify what "all bits" you are referring to. Object representation of integer types in C/C++ might include value-forming bits and padding bits. The only integer type that is guaranteed not to have padding bits is [signed/unsigned] char.

The language always guaranteed that if all value-forming bits are zero, then the represented integer value is also zero.

As for padding bits, things are/were a bit more complicated. The original specification of C language (C89/90 as well as the original C99) did not guarantee that setting all object bits to zero produced a valid integer representation. It could've produced an invalid trap representation. I.e. in the original C (and even in C99 at first) using memset(..., 0, ...) on integer types did not guarantee that the objects will receive valid zero values (with the exception of [signed/unsigned] char). This was changed in later specifications, namely in one of the technical corrigendums for C99. Now it is required that all-zero bit pattern in an integer object (that involves all bits, including padding ones) represents a valid zero value.

I.e. in modern C it is legal to use memset(..., 0, ...) to set any integer objects to zero, but it became legal only after C99.

Thanks for clarifying that; I meant "value-forming" bits. I'll edit to explicitly state this. — Cornstalks, Aug 25 '12 at 22:04
Are there any pre-C1 implementations where `memset(..., 0, ...)` _does_ result in a trap value? — JAB, Jan 18 '16 at 19:36

score 2 · Answer 3 · answered Aug 25 '12 at 22:10

You already got some answers about the representation of integer values. There is exactly one way that is guaranteed to give you all the individual bits of any object that is represented in memory: view it as array of unsigned char. This is the only integral type that has no padding bits and is guaranteed to have no trap representation. So casting a pointer of type T* to your object to unsigned char* will always work, as long as you only access the first sizeof(T) bytes. By that you could inspect and set all bytes (and thus bits) to your liking.

If you are interested in more details, here I have written something up about the anatomy of integer types in C. C++ might differ a bit from that, in particular type puning through union as described there doesn't seem to be well defined in C++.

score 1 · Answer 4 · answered Aug 25 '12 at 21:36

Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)

No. The standards for C and C++ don't rule out signed magnitude or one's complement, both of which have +0 and -0. While +0 and -0 do have to compare equal, but they do not have to have the same representation.

Good luck finding a machine nowadays that uses signed magnitude or one's complement.

davmac · Answer 5 · 2015-07-16T16:15:55.490

If all the bits in an integer type are zero, does the integer as whole represent zero?

Edit: since you have now clarified that you are not concerned with the padding bits, the answer to this is actually "yes". But I leave the original:

Not necessarily, it could be a trap representation. See C99 6.2.6.1:

For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter)

The presence of padding bits allows for the possibility that all 0 is a trap representation. (As noted by Keith Thompson in the comment below, the more recent C11 makes explicit that such a representation is not a trap representation).

and

The values of any padding bits are unspecified

and

44) Some combinations of padding bits might generate trap representations

If you restrict the question to value and sign bits, the answer is yes, due to 6.2.6.2:

If there are N value bits, each bit shall represent a different power of 2 between 1 and 2 N −1 , so that objects of that type shall be capable of representing values from 0 to 2 N − 1 using a pure binary representation; this shall be known as the value representation.

and

If the sign bit is zero, it shall not affect the resulting value.

If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)

Not necessarily, and in fact sign-and-magnitude is explicitly supported in 6.2.6.2.

Is there a guaranteed way to check if any bit is not set?

If you do not care about padding and sign bits, you could just compare to 0, but this would not work with a 1's complement representation (which is allowed) seeing as all bits 0 and all bits 1 both represent the value 0.

Otherwise: you can read the value of each byte via an unsigned char *, and compare the result to 0:

Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation

If you want to check a specific value bit, you could construct a suitable bitmask using (1u << n), but this will not necessarily let you inspect the sign bit.

Is there a guaranteed way to check if any bit is set?

The answer is essentially the same as to the previous question.

Is there a guaranteed way to set the left-most and/or right-most bits?

Do you mean left-most value bit? You could count the bits in INT_MAX or UINT_MAX or equivalent depending on the type, and use that to construct a value (via 1 << n) with which to OR the original value.

If the answer to #1 is "no" how could one iterate over the bits in an integer type and check if each one was a 1 or a 0?

You can do so using a bitmask which you left shift repeatedly, but you can check only the value bits this way and not the sign bit.

C11 adds an explicit requirement in 6.2.6.2p5: "For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type." — Keith Thompson, Feb 18 '15 at 19:54
@KeithThompson: Indeed. It's interesting to note that the Standard doesn't say *the* representation of zero, but the requirement does mean that using `memset` to clear an integer type, or `calloc` to get an initially-zero array, is guaranteed to work. I wish the Standard would define a macro that would indicate whether all-bits-zero was a valid representation of a pointer, so that portable code could avoid manually clearing pointers in `calloc`'ed memory on machines where that wasn't necessary. — supercat, Jul 16 '15 at 15:29

score 0 · Answer 6 · answered Feb 19 '15 at 14:44

0

If you want your brain to explode, consider this: If you interpret an int or long or long long as an array of unsigned char (which is the most reasonable thing to do if you want to see all the bits), you know that the order of bytes is not defined, for example "bigendian" vs. "littleendian". We all (hopefully) know that.

But it is worse: Each bit of an int could be stored in any of the bits of the array of char. So there are 32! ways how the bits of a 32 bit integer could be mapped to an array of four 8-bit unsigned chars by a truly bizarre implementation. Fortunately, I haven't encountered more than two ways myself (and I know of one more ordering in a real computer).

answered Feb 19 '15 at 14:44

gnasher729

51,477
5
75
98

1

In C99 6.2.6.2, for signed types, "Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type"; for unsigned types, "objects of that type shall be capable of representing values from 0 to 2^(N−1) using a _pure binary representation_". Footnote (40) describes pure binary representation. For the scenario you describe, 'successive bits' (as per footnote 40) would have to be different when the object is decomposed into an array of `unsigned char`, which seems a stretch, even just theoretically. – davmac Feb 19 '15 at 15:41
@davmac: Nothing in the Standard requires that padding bits be at one "end" of a number; it is entirely plausible that a 36-bit machine which had no non-trapping unsigned integer instructions might use a 36-bit unsigned `int`, a 35-bit `unsigned int` (with one bit of padding), and a 71-bit `long long` (with the padding bit in the middle). Given that, and given that the Standard doesn't mandate byte order, it's not a stretch to say that it would allow arbitrary permutations. – supercat Jul 14 '15 at 23:54
@supercat That would depend on having "successive bits" mean "successive bits in an arbitrarily chosen ordering of bits" (rather than "successive bits in the natural ordering of bits"). Perhaps that is indeed what is intended, but if so, the wording is not perfect. – davmac Jul 15 '15 at 09:47
@davmac: There are a lot of places where the standard is specific enough to preclude what might otherwise be the optimal implementations on some hardware, but not specific enough to let portable code do anything it couldn't do without such specifications. Some involve commonplace situations. A platform with 32-bit `int`, given `int16_t i=32767; i+=2; int j=i;`, would be allowed to compute the resulting value for `i` and `j` in any fashion it saw fit, provided it was documented and was in the range INT16_MIN..INT16_MAX but would not be allowed to have `j` receive the value 32769 even on... – supercat Jul 15 '15 at 21:13
...platforms where it would be fastest to compute the new value of `i` in a 32-bit register. On many such platforms, the requirement to coerce `j` into range would compel the compiler to output an extra instruction or two, and the cost of those instructions might be worthwhile if code could rely upon the resulting value, but there is no reasonable means by which portable code can determine with certainty how the conversion will be performed. – supercat Jul 15 '15 at 21:25
@supercat I'm not sure how what you're saying relates to the discussion up to this point. In any case the example you give doesn't match up to the criteria you specify - it is "not specific enough to let portable code do anything" it couldn't otherwise, for sure, but it certainly does allow for optimal implementation, because there is no requirement for the value of j to be coerced into the range of int16_t (because undefined behavior has already been invoked). In other words it allows the compiler to perform signed addition without checking for overflow. – davmac Jul 16 '15 at 09:57
@davmac: On a system where `int` is 32 bits, coercing a value outside the range INT16_MIN to INT16_MAX to int16_t yields Implementation Defined Behavior, not Undefined. Only values that overflow the range of `int` are allowed to yield Undefined Behavior. An implementation could specify that coercion of any out-of-range value to `int16_t` would yield the bit pattern 0x8000, that such a bit pattern was a trap representation, and that accessing any `int16` rvalue with that value would yield Undefined Behavior, but only if it in fact *did* yield that value. Otherwise, code could... – supercat Jul 16 '15 at 13:47
...coerce an out-of-range value to `int16_t`, type pun it to an `unsigned char[]`, and observe that its bitwise representation was not as documented. Interestingly, on a system with 32-bit `int`, if `si` is `int16_t` and `ui` is `uint16_t`, it's possible for `ui*=ui;` to yield Undefined Behavior, but it's not possible for `si*=si;` to do so. – supercat Jul 16 '15 at 13:48
@davmac: With regard to the earlier discussion, my point is that the Standard often defines actions sufficiently weakly that the only way they can be used safely in portable code is to have a human read the documentation for every compiler that will be used. I don't interpret the Standard as saying that implementations must store consecutive bits of a `long` within consecutive bits of a `char`, and I'm not sure how programmers could benefit from such a guarantee without any additional guarantee (that the Standard doesn't offer) of how the bit ranges from all the `char` are assembled. – supercat Jul 16 '15 at 15:14
@davmac: I can, however, imagine a plausible implementation where such a requirement could add cost: a 16-bit system without byte-addressable storage, which lacks fast instructions for shift-by-eight (or shift-by-N) but includes an instruction to bit-reverse a word. On such a system, it would be most efficient to have `char` be 16 bits, but if it were necessary (for storage-capacity reasons) to pack two `char` values per word, the most efficient way to do that would be to have the `char` value for the upper half of each word be stored in reverse-bit order. – supercat Jul 16 '15 at 15:17
@supercat, ok I understand what you're getting at (I think). I think there probably are a few cases where the restrictions imposed by the language standard (on either the implementation or the programmer) are not beneficial, yes. The whole issue of representation is a case in point; it's not clear to me why there are _any_ restrictions imposed on representation, if the intent is to allow some arbitrary mapping of value bits to representational bits. I mean why require a "pure binary representation" if there is no way to make use of that? (which I suppose is your point). – davmac Jul 16 '15 at 15:38
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/83450/discussion-between-davmac-and-supercat). – davmac Jul 16 '15 at 15:43

score -1 · Answer 7 · answered Aug 25 '12 at 21:14

-1

For the bitmanipulations you could make a struct with 8 one unsigned bit fields and let the pointer of that struct point to your char. In that way you can easily access each bit. But the compiler will probably do masking under the hood, so it is only a cleaner way for the programmer I think. You must check that your compiler doesn't change the order of the fields when doing this.

yourstruct* pChar=(yourstruct*)(&c)
pChar.Bit7=1;

answered Aug 25 '12 at 21:14

hamon

1,006
6
7

1

The fact that the compiler can reorder the bitfield kinda ruins the point of the question... – Cornstalks Aug 25 '12 at 21:27
This would contravene the strict aliasing rules (C99 6.5 paragraph 7). – davmac Feb 04 '15 at 13:15

score -1 · Answer 8 · answered Aug 25 '12 at 21:15

-1

Let me caveat this by saying I'm addressing C and C++ in general (e.g. C90 and lower, MS Visual C++, etc): the "greatest common denominator" (vs. the latest/greatest cx11 "standard").

Q: If all the bits in an integer type are zero, does the integer as whole represent zero?

A: Yes

Q: If any bit in an integer type is one, does the integer as a whole represent non-zero? (if this is a "yes" then some representations like sign-and-magnitude would be additionally restricted)

A: Yes. This includes the sign bit, for a signed int. I'm frankly not familiar with "magnitude"

Q: Is there a guaranteed way to check if any bit is not set?

A: "And'ing" a bitmask is always guaranteed.

Q: Is there a guaranteed way to check if any bit is set?

A: Again, "and'ing" a bitmask is always guaranteed.

Q: Is there a guaranteed way to set the left-most and/or right-most bits?

A: I believe you should always have a "MAX_INT" available for all implementations/all architectures to determine the leftmost bit.

I'm prepared to be flamed ... but I believe the above is accurate. And I hope it helps.

IMHO...

answered Aug 25 '12 at 21:15

paulsm4

114,292
17
138
190

3

The question is about what the standards say, not what compilers do. So to be clear, are you saying that the standard *requires* all of these things? – Nicol Bolas Aug 25 '12 at 21:23
1

I think padding bits would make 2 a 'no'. – Michał Górny Aug 25 '12 at 21:24
1

One's complement, signed magnitude, offset 2^N throw a wrench into your answers for signed integers. – David Hammen Aug 25 '12 at 21:45
@Michał Górny: From the C99 spec: "Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int..." In other words, exactly as I said ;) – paulsm4 Aug 26 '12 at 01:23
@David Hammen - I believe even for 1's complement, "all zeros" == "false", and "everything else" (including "-0") is "true". So my statement remains :) And I don't think there's any disagreement about the rest, is there? – paulsm4 Aug 26 '12 at 01:27
@paulsm4: By "represent non-zero" I meant literally non-zero (> or < 0). In other words, if I did `(a == 0)` could it ever evaluate to true if `a` had some bit set? (if #2 is really "yes" then the answer to this is "no"). Are you saying if `a = -0` (has one bit set in this example) and `b = +0` (has no bit set in this example), that `a != b`? – Cornstalks Aug 26 '12 at 16:17

What do the C and C++ standards say about bit-level integer representation and manipulation?

8 Answers8

Linked

Related