4

Are underlying bytes of an object allowed to change, if the value itself is not changed?

So, for example, can this code-snippet print "differ"?

int a = 0;
char b[sizeof(int)];

memcpy(b, &a, sizeof(int));
if (memcmp(b, &a, sizeof(int)) {
    printf("differ\n");
}

Here's the question that made me to ask this: Is delete allowed to modify its parameter?, check out the comments below the question, for example, this comment from Johannes Schaub:

What rule forbids changing the internal bits of an int? As far as I know, the implementation is even allowed to make int a = 0; /* test bits of 'a' now /; / test bits of 'a' now*/ have two different bits each time

geza
  • 28,403
  • 6
  • 61
  • 135
  • 1
    What do you mean by "value" though? There are many objects that would be considered equal, but are almost never byte equivalent. For example two `std::string` or `std::vector` might contain the same letters or numbers, but will likely have different dynamically allocated buffers, so will not `memcmp` the same (the buffer from `data()` would, but not the objects themselves) even without "unexpected" corner cases like padding, `mutable` members, or "denormalised" values. – Fire Lancer Aug 23 '17 at 12:03
  • @FireLancer: first, let's consider simple types, like an int, or a pointer. I mean by value is the value you read/write, when you use it "normally" (not memcpy, etc.), like "a = 42;", or "b = c;" – geza Aug 23 '17 at 12:17
  • 1
    Then no, an `int` etc. with the same value has the same byte representation, because changing any bit of its memory would change its value. Likewise with pointers changing any bit changes its address. But you could say change the byte representation of `bool` and still have it be equal to `true`, because true is basically `bool != 0`, and even an 8bit bool has 255 representations for that. – Fire Lancer Aug 23 '17 at 12:29
  • @FireLancer : You assume that an int contains no padding bits. This question is tagged `language-lawyer`, and the language specifically allows ints to contain padding bits. – Martin Bonner supports Monica Aug 23 '17 at 12:31
  • Hmm, id have to look in detail, can `int` contain padding? It can be different sizes, but never seen one with actual padding. – Fire Lancer Aug 23 '17 at 12:32
  • Whereas you don't expect padding for `int`, it is different for `bool`. – Jarod42 Aug 23 '17 at 12:52
  • @Martin, where in the standard does it say padding bits are allowed. The only two mentions I can find (C++11) are in bitfields and the atomic operations library which also mentions trap bits. But the fundamental types section seems to indicate ints cannot have padding bits. I miss the (relative) simplicity of the C standard :-) – paxdiablo Aug 23 '17 at 13:24
  • The padding of ints is probably not an issue (always 4 byte aligned), but consider stuct Bla { bool m_b; int m_n}; The sizeof would probably be 8 byes, where 3 bytes are used for alignment. These bytes are not used for data so i am not sure if one can use memcmp in this case. – gast128 Aug 23 '17 at 13:43
  • @paxdiablo : You are right. It's a lot less explicit in C++; I must have been reading the C standard. However it is present. The last sentence of 3.9p4 in [n4296](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf) says "For trivially copyable types, the value representation is **a** set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values." (my emphasis). It doesn't say that all the bits in the object representation have to be part of the value representation. – Martin Bonner supports Monica Aug 23 '17 at 14:02

1 Answers1

3

Generally, memcpy and memcmp work strictly on bytes so they cannot differ.

One reading of the (C++11) standard seems to indicate it may be possible for an int to differ from another (according to memcmp) that you've just assigned it from, if integers are allowed to have padding bytes which have no effect on the value.

It would seem to be feasible as per your code with an int and similarly-sized char buffer:

int a = 0;
char b[sizeof(int)];
memcpy(b, &a, sizeof(int));

for the padding bytes (if any) in a to change in such a way that the underlying value does not change. That could cause a memcmp to fail.

That particular reading can be found in C++11 3.9.1 Fundamental types:

For character types, all bits of the object representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types.

That allows for the possibility of padding bits within non-character types and there's nothing in the standard explicitly preventing those bits from changing at any time.

However, in that same section, it lumps the character and signed or unsigned integers into a "integral type" category and states that the:

representations of integral types shall define values by use of a pure binary numeration system. (footnote 49) [Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end example ]

Footnote 49 state:

A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position. (Adapted from the American National Dictionary for Information Processing Systems.)

That doesn't seem to leave the possibility open for padding bits in these types at all, because it very specifically calls out successive bits and powers of two, with the only exception specifically mentioned being the high bit (used for deciding sign for the three possible encodings) (a).

So I suspect that memcmp will not be able to fail immediately following a memcpy using the same memory blocks and size.

That's totally irrelevant in the question you link to, of course, since there's an intervening operation, delete, which is free to change the underlying bit pattern. That situation is no different to:

int a = 0;
char b[sizeof(int)];
memcpy(b, &a, sizeof(int));
a = 42; // intervening operation

after which a memcmp would be pretty much guaranteed to consider the two memory blocks as different.


(a) Annoyingly, there is one potential reading allowing for padding bits while still satisfying the "successive" bits and powers-of-two mentioned above - that's if the padding bits are at the low end of the underlying bit pattern (furthest from the sign). If that were allowed then, yes, memcmp immediately after memcpy could report a difference.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • *"What rule forbids changing the internal bits of an int? As far as I know, the implementation is even allowed to make `int a = 0; /* test bits of 'a' now */; /* test bits of 'a' now*/` have two different bits each time. – Johannes Schaub - litb"*. Even if I don't think that sane implementation does something like that. – Jarod42 Aug 23 '17 at 12:07
  • @Jarod42: thanks, I think I'll add this comment into the question itself – geza Aug 23 '17 at 12:13
  • @Jarod, as per `C++11 3.9.1 Fundamental types`: "For character types, *all* bits of the object representation participate in the value representation." Hence there can be no padding. – paxdiablo Aug 23 '17 at 12:18
  • @geza, no, I originally misunderstood the context, thinking that the `memcpy` was between two `char` buffers - I have now adjusted my answer. With one being an `int`, there is very much the possibility that it could change its underlying bit pattern at any time provided that doesn't affect the value. This would cause a `memcmp` to report the blocks are different. – paxdiablo Aug 23 '17 at 12:31
  • @paxdiablo: thanks! Can you give me the relevant part of the standard, which says that this is allowed? So far, I assumed that bytes in padding in structs doesn't change. Now, even, you say that there can be a change in byte-representation of any (but not byte-based) types. So, for example, if I calculate hash (which operates on bytes) for an int-array, then I calculate again, they could differ. That's quite unexpected. – geza Aug 23 '17 at 12:47
  • Actually, I think I've changed my mind yet again. See the update, C++11 seems to tightly limit the possible encoding of integral types in a way that prevents padding bits. I may be wrong (it wouldn't be the first time) but that's my current reading of the standard. – paxdiablo Aug 23 '17 at 13:15
  • @paxdiablo: okay :) "delete, which is free to change the underlying bit pattern". Why is it so? – geza Aug 23 '17 at 14:27
  • @geza, I believe Bjarne intended at some point that deleting a pointer would also set the pointer to null so that it couldn't be used again. I've never seen an implementation that actually *does* this. – paxdiablo Aug 23 '17 at 14:33
  • @paxdiablo: I don't want to discuss that question here :) but, where does the standard say so? Bjarne said it, it's okay :), but I haven't found anything about it in the standard. – geza Aug 23 '17 at 14:35
  • @geza, you could read the standard in one of two modes: 1/ anything not explicitly allowed is forbidden; or 2/ anything not explicitly forbidden is allowed :-) – paxdiablo Aug 23 '17 at 14:38
  • I would question your reasoning that there cannot be padding in integers memory representation except for chars. The padding could be at the end of the representation, could it not? Could not an integer have 20 bits, and size 3, with the highest 4 bits padding? This does not seem to change the answer since memcpy and memcmp both deals with chars that are not allowed to have padding. – PaulR Aug 23 '17 at 14:54
  • Furthermore I read in the C11 standard: "6.2.6.2 Integer types 1 For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter) (...) For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; " – PaulR Aug 23 '17 at 14:56
  • @paxdiablo I don't see anywhere in the standard where it actually defines what value is retrieved by aliasing (or memcpying) one byte of an `int` – M.M Aug 23 '17 at 21:30
  • 1
    @PaulR, that's the C standard you quote, this is C++. And for your statement about padding at the end of the memory block, I believe that's covered by my footnote a. – paxdiablo Aug 24 '17 at 00:37
  • @paxdiablo: Ah, you are right of course, I got carried away looking up the memcpy and memcmp library definitions, sorry. – PaulR Aug 24 '17 at 14:20
  • @paxdiablo: This answer seems to imply your footnote is indeed the correct interpretation: https://stackoverflow.com/a/26241722 – PaulR Aug 24 '17 at 14:26