Using memcpy to copy an int into a char array and then printing its members: undefined behaviour?

Question

Consider the following code:

int i = 1;
char c[sizeof (i)];
memcpy(c, &i, sizeof (i));
cout << static_cast<int>(c[0]);

Please ignore whether this is good code. I know the output depends on the endianness of the system. This is only an academic question.

Is this code:

Undefined behaviour
Implementation-defined behaviour
Well-defined behaviour
Something else

Is http://stackoverflow.com/q/12612488/560648 not sufficient? — Lightness Races in Orbit, Mar 20 '15 at 18:06
Read [basic.types]/p2. `c` will hold the object representation of `i`, the value of which is determined by the system byte order. The value of `c[0]` is *indeterminant* but the program is well formed. — David G, Mar 20 '15 at 18:06
@LightnessRacesinOrbit Sorry but I don't understand the relevance of that question. — Neil Kirk, Mar 20 '15 at 18:09
I wrote this kind of code a lot simply to work-around some byte stream with known representation but unknown alignment. — user3528438, Mar 20 '15 at 18:38

AnT stands with Russia · Answer 1 · 2015-03-20T18:18:37.440

6

The language does not say that doing this is immediately undefined behavior. It simply says that the representation of c[0] might end up being invalid (trap) representation, in which case the behavior is indeed undefined. But in cases when c[0] is not a trap representation, the behavior is implementation-defined.

If you use unsigned char array, trap representation becomes impossible and behavior becomes purely implementation-defined.

edited Mar 20 '15 at 18:18

answered Mar 20 '15 at 18:09

AnT stands with Russia

312,472
42
525
765

Would a trap representation be possible if `i` were `unsigned`? – Neil Kirk Mar 20 '15 at 18:17
2

@Neil Kirk: No. `unsigned char` has no trap representations. – AnT stands with Russia Mar 20 '15 at 18:19
I meant `unsigned int` on the `i`, but are you saying if `i` is `signed int` and the array if `unsigned char`, there'd be no trap representation? – Neil Kirk Mar 20 '15 at 18:22
Even for bare `char`, there's guaranteed not to be a trap representation. – Ben Voigt Mar 20 '15 at 18:29
1

@Ben Voigt: Why? Wording in 3.9.1/1 seems to deliberately single out unsigned character types as having no trap representations: "For unsigned character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types." – AnT stands with Russia Mar 20 '15 at 19:07
@Neil Kirk: Whether `i` itself is signed or unsigned makes no difference in this context, at least from formal point of view. – AnT stands with Russia Mar 20 '15 at 19:09
1

@AnT: The requirements of 3.9p2 cannot be met unless a single `char` object holds as many unique values as `unsigned char`, and `unsigned char` has no padding bits -- every physical representation in `unsigned char` is a separate value. Therefore the same must also be true for `char`. – Ben Voigt Mar 20 '15 at 19:36

Ben Voigt · Accepted Answer · 2015-03-20T18:35:38.163

The rule you are looking for is 3.9p4:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.

So if you use unsigned char, you do get implementation-defined behavior (any conforming implementation must give you a guarantee on what that behavior is).

Reading through char is also legal, but then the values are unspecified. You are however guaranteed that using unqualified char will preserve the value (therefore bare char cannot have trap representations or padding bits), according to 3.9p2:

For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.

("unspecified" values are a bit weaker than "implementation-defined" values -- the semantics are the same but the platform is not required to document what the values are.)

"but then the values are unspecified" - where does it say that? — M.M, Dec 30 '15 at 03:58
"therefore bare char cannot have trap representations or padding bits" - that doesn't follow from the 3.9p2 quote. There might be trap representations but they cannot be produced via such a copy because the original object would have been a trap if it had them. (e.g. parity bits) — M.M, Dec 30 '15 at 04:00
char is allowed to alias ANY type without causing a trap, therefore no legal content of memory is a trap with char. That doesn't rule out ECC, but ECC bits are not "padding". — Ben Voigt, Dec 30 '15 at 19:05
Also see my comment to AnT's answer which uses another approach to prove char has no padding bits. — Ben Voigt, Dec 30 '15 at 19:06

score 1 · Answer 3 · answered Mar 20 '15 at 18:14

It is clearly implementation defined behaviour.

The internal representation of an int is not defined by the standard (implementations can choose little or big endian or whatever else), so it cannot be well defined behaviour : the result is allowed to be different on different architectures.

On a defined system (architecture and C compiler and (eventually) configuration) the behaviour is perfectly determined : on a big endian, you will get a 1, on a little endian a 0. So it is implementation defined behaviour.

Using memcpy to copy an int into a char array and then printing its members: undefined behaviour?

3 Answers3

Linked