10

C++ standard

If a C++14 implementation includes padding bits in the underlying bytes of an unsigned int , does the standard specify if bitwise operations must not be performed on padding bits ?

Additionally, does the C++14 standard specify if equality and relational operators must ignore the padding bits ?

Guidelines

If there is a lack of specification on that matter, is there some kind of consensus on the expected behavior of those operators on padding bits?

I found conflicting answers on Stack Overflow. Lightness Races in Orbit and ecatmur say that bitwise operators are unsuitable for arithmetic because they are applied on all bits (including padding bits), while Christoph and Bartek Banachewicz say that the bitwise operators work on the logical value of integers and ignore padding.

References

Related answers: on the existence of padding bits (1, 2, 3), on the absence of clear C++ specification (4).

Definition of padding bits in C++14 - § 3.9.1 - Fundamental types:

For narrow character types, all bits of the object representation participate in the value representation. For unsigned narrow character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types.

Definition of object representation and value representation in C++14 - § 3.9 - Types:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.44

Footnote 44) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

Definition of bitwise AND in C++14 - § 5.11 - Bitwise AND operator:

The usual arithmetic conversions are performed; the result is the bitwise AND function of the operands. The operator applies only to integral or unscoped enumeration operands.

Definition of addition in C++14 - § 5.7 - Additive operators:

The usual arithmetic conversions are performed for operands of arithmetic or enumeration type. For addition, [...] both operands shall have arithmetic or unscoped enumeration type [...]. The result of the binary + operator is the sum of the operands.

sfjac
  • 7,119
  • 5
  • 45
  • 69
RalphS
  • 627
  • 4
  • 15
  • How could you ever tell? The only thing you can get out is the value, which excludes the padding bits. – user207421 Jan 18 '18 at 23:04
  • @EJP By inspecting the object representation? – T.C. Jan 18 '18 at 23:07
  • Bitwise operations are defined in terms of value bits. They have no access to padding bits at all (unless you reinterpret the memory as unsigned char array). – AnT stands with Russia Jan 18 '18 at 23:10
  • Related [What's the result of a & b?](https://stackoverflow.com/q/29394518/1708801) talks about some of the under-specification in bitwise operations. – Shafik Yaghmour Jan 18 '18 at 23:12
  • 1
    Are you aware of any actual architectures that use a padded int? I'm not. The concept is kind of foreign. – Mark Ransom Jan 18 '18 at 23:13
  • @EJP If one's complement operator `~` and equality operator `==` work on all bits (including padding bits), and if the unary subtractive operator `-` work only on value bits, then `~(-1) == 0` will be false. – RalphS Jan 18 '18 at 23:14
  • @MarkRansom: I would expect that a compiler for a fixed-point DSP with a 16-bit or 32-bit data bus and a 40-bit accumulator would likely include a data type which held 40 data bits and either 8 or 24 padding bits. I'm not sure how common 40-bit accumulators are today, but they make it possible to compute a precise integer sum of up to 256 32-bit values without any possibility of overflow. If one needs a 64-bit sum of more values, one can add groups of 256 at a time. A 64-bit accumulator would work just as well, but would cost more while offering little additional benefit. – supercat Jan 18 '18 at 23:18
  • @RalphS But they don't work on all bits. – user207421 Jan 18 '18 at 23:28
  • @AnT @ EJP I hope so, and I am trying to find evidence of this in the C++14 standard. – RalphS Jan 18 '18 at 23:37
  • @MarkRansom: The primary architecture of which I'm aware is the Burroughs B6500, which attached a 3-bit tag to each 48-bit word of data. The tag signified the type of the data, and attempting to access it as a different type of data would result in a hardware exception. – Jerry Coffin Jan 18 '18 at 23:52
  • @JerryCoffin the related links on the right side of the page found [an earlier question](https://stackoverflow.com/questions/4475540/c-question-padding-bits-in-unsigned-integers-and-bitwise-operations-c89?rq=1) which mentions Cray as an example too. – Mark Ransom Jan 18 '18 at 23:59
  • @MarkRansom: The Cray did have padding bits, but is different in having nothing like a trap representation--the padding bits were completely ignored. If we include that, then most current implementations include padding bits as well. A `bool` really only has two values, but always occupies at least one complete byte, and sometimes more (a whole 32-bit word in some cases). – Jerry Coffin Jan 19 '18 at 00:19
  • @JerryCoffin it's not clear from the quoted parts of the standard if the padding was meant to apply to trapping representations or others. I'm guessing both. – Mark Ransom Jan 19 '18 at 00:52
  • @MarkRansom: I believe both, but unless you can get a trap representation (or similar) they become *almost* completely uninteresting. Almost everything in the standard that talks about them is because of the possibility of something like writing to one member of a union, reading another member, and your program going "boom" instead of just giving you some bits with unknown values in them. – Jerry Coffin Jan 19 '18 at 00:57

2 Answers2

4

First of all, the C++ standard itself barely says about padding bits. Essentially all discussion of padding bits comes from the base document (i.e., the C standard).

So the real question is what the C standard says about things. Its footnote 54 gives a fairly concise summary of padding bits in general:

Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow. All other combinations of padding bits are alternative object representations of the value specified by the value bits.

Operators might change a padding big. The obvious case would be a padding bit that represented parity. If you change a value's parity, the parity bit would change to match.

The "alternative object representations of the value" part basically mean that as long as you stay "in bounds", the padding bits don't affect your results. For example, if you compare two values, only the representation bits are used to determine the results (6.2.6.1/4):

Two values (other than NaNs) with the same object representation compare equal, but values that compare equal may have different object representations.

The times and places you have to be careful mostly involve undefined or implementation defined behavior. For example, if you store a value into one value in a union, then retrieve a different value in the union, it's possible the second could have the padding bits set to a trap representation, so even looking at the value that way could crash your program (or whatever).

Likewise, if you were to take two values, memcpy each to an buffer of unsigned char, some bits of those bytes might compare as not-equal, even if the values they represented did compare equal.

One place this can bit you even if you never use mempy directly is with some of the compare-and-exchange operators. These use memcpy and memcmp for the underlying operations, so they're also subject to comparing not equal, even though the values being represented are equal:

[atomics.types.operations]/23:

The memcpy and memcmp semantics of the compare-and-exchange operations may result in failed comparisons for values that compare equal with operator== if the underlying type has padding bits, trap bits, or alternate representations of the same value. Thus, compare_exchange_strong should be used with extreme care. On the other hand, compare_exchange_weak should converge rapidly.

Side note: the two large quotes are descriptive, not normative--from a normative viewpoint, padding bits have almost no meaning; almost anything that could expose padding bits or their values involves implementation defined or undefined behavior. The only normative quote here is the one that basically says: "padding bits have no effect."

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • In your first quote, there is: "**no arithmetic operation on valid values can generate a trap representation**". I wonder if a bitwise operation is considered as an "arithmetic operation" or as an "operation on raw bytes". – RalphS Jan 19 '18 at 16:55
  • @RalphS: I don't think there's a definition in the standard to say so, but they're defined in terms of arithmetic (e.g., "the value of the result is E1 × 2^E2"). – Jerry Coffin Jan 19 '18 at 17:54
  • Sadly it's only true for the shifts. Now that I have a better understanding of the issue, I would like to change the question to: "*Does C++14 guarantee that bitwise operations on valid unsigned int can not generate trap representations?*". The answer being basically "*No. But it is ok to assume it, because only a fool would write a compiler that do not respect that principle.*" Is it ok to change the question? (I am new here.) The question in its current wording is a bit useless for those who want to do math on unsigned int values. – RalphS Jan 19 '18 at 22:08
  • On the other hand, the question in its current wording could be useful to someone who wants to perform a bitwise XOR on huge portions of raw memory ([XOR cipher](https://en.wikipedia.org/wiki/XOR_cipher)). Using the type `unsigned char` is mandatory in that case (there can't be padding bits in `unsigned char` representation). – RalphS Jan 19 '18 at 22:39
0

If an implementation specifies a storage format for integer types that includes padding bits, it may write anything it likes to such bits when an object is written, and may impose any requirement it sees fit on the values such bits must hold, behaving in arbitrary fashion if that requirement isn't met, subject to two constraints:

  1. If any write to an object of such type has yielded a particular bit pattern, that bit pattern must be acceptable, and must yield the same value, when read from any object of that type.

  2. If all of the bits of an integer-type object are zero, the object must be regarded as valid and must read zero.

If an implementation ignores padding bits on reads, bitwise operators may affect them or not in any manner the implementation sees fit. If an implementation were to trap when the total number of "1" bits in a multi-byte integer is odd, but always write a padding bit value that made the total parity be even, bitwise operators would be required to compute the parity bit based upon the data bits and write it appropriately.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1
    Is this something from the standard, or somewhere else? Cite your source. – 1201ProgramAlarm Jan 19 '18 at 00:42
  • I think I get it. Correct me if I am wrong, but I think the C++ standard is not meant to be exhaustive. A compiler can be 100% compliant with the standard, and still not making any sense. And there is no point in trying to write code that will work properly with such compiler. So in a way, a correct implementation of C++ need two things : respect the requirements described in the standard **and** respect the rules of common sense. One of these rules is to make sure the bitwise operators work as everyone expect, without interference from the padding bits. – RalphS Jan 19 '18 at 03:07
  • @RalphS: Unfortunately, while the C and C++ Standards were originally written on the presumption that compiler writers can be relied upon to exercise common sense, and it wasn't necessary to mandate behaviors that would obviously make sense on certain platforms, compiler writers have increasingly regarded the lack of mandates as an indication that programmers are entitled to expect such behaviors and compiler writers should feel no obligation to support them. Unfortunately, the semantics of C and C++ are getting eroded in the name of "optimization". – supercat Jan 19 '18 at 07:02
  • 1
    @1201ProgramAlarm: For primitive objects and Plain Old Data Structures (PODS), the Standard requires that copying all the bytes of a value will copy the value. Thus, if storing the value 42 into an `unsigned short` writes some sequence of bits into its storage (possibly including padding bits), copying that sequence of bits into another `unsigned short` must make it hold the value 42. Types including integer types are allowed to have trap representations, which are sequences of bits which do not represent valid values. – supercat Jan 19 '18 at 07:05