5

While searching for a way to combine sizeof(double) chars to a double, I read in several posts, that using std::memcpy was the recommended way:

char bytes[sizeof(double)];
// fill array
double d;
std::memcpy(&d, bytes, sizeof(double));

However, I wonder why further usage of d can be defined behavior.

If it was not a double, but a complex class object, accessing it surely would not be defined either, would it? So, why should it be the case for a double.

Edit: To make my problem clear, I wanna specify my goal: I would like to find a way to combine several chars to a double and further use this double, without causing undefined behavior. I do not expect the value of the double to be specified. I deem this impossible anyway, since the standard does not even say anything about the size, not to mention bit layout of double. However, I demand d to have some valid (i. e. 'accessible') double-value.

Reizo
  • 1,374
  • 12
  • 17
  • 1
    Because `double` is *TriviallyCopyable* - related question: https://stackoverflow.com/questions/29777492/why-would-the-behavior-of-stdmemcpy-be-undefined-for-objects-that-are-not-triv – UnholySheep Oct 02 '19 at 22:01
  • related: https://stackoverflow.com/questions/51300626/is-stdmemcpy-between-different-trivially-copyable-types-undefined-behavior – geza Oct 02 '19 at 22:40
  • 2
    There is no type-punning here. `d` is properly constructed, accessing it is not type-punning. `reinterpret_cast`ing `bytes` to `double`, and accessing it as `double` would be type-punning. – geza Oct 02 '19 at 22:51
  • @geza fair point, I corrected the title and tags. – Reizo Oct 02 '19 at 23:02
  • 1
    The answer depends on what `// fill array` is. – L. F. Oct 03 '19 at 05:32
  • @L.F. bytes are read from theoretically arbitrary file content. In practice it contains an IEEE-754 double but of course you can never guarantee that and that's my concern. – Reizo Oct 03 '19 at 08:57

3 Answers3

4

Why does type-punning using std::memcpy not cause undefined behaviour?

Beause the language says so (latest draft):

[basic.types]

For any object (other than a potentially-overlapping subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std​::​byte ([cstddef.syn]). If the content of that array is copied back into the object, the object shall subsequently hold its original value.

Note however the condition on that rule. Your code can potentially have undefined behaviour, but not (unless some other rule says so) in case the copied value was originally copied from another double, or in practice, if the value could have been copied from a double.

If it was not a double, but a complex class object, accessing it surely would not be defined either, would it?

Depends on what you mean by complexity. The conditions where this applies are in the quoted rule.

Community
  • 1
  • 1
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 1
    Congrats on knowing where to find the relevant part of the standard. – Mark Ransom Oct 02 '19 at 22:08
  • So, as long as I can guarantee that the bytes int `bytes` were originally copied from another `double` (on an equivalent platform) I can use `d`. But a very common case for this, is to type-pun (potentially corrupted) raw bytes from a file or network connection. Then I can practically not make that guarantee. So accessing `d` may not be valid in this scenario? – Reizo Oct 02 '19 at 22:21
  • @Reizo It wouldn't be valid indeed. At least not in general. In practice, you may probably assume IEEE-754 conformance, in which case you only need to take care of dealing with signaling NaN values. – eerorika Oct 02 '19 at 22:26
  • @eerorika What does 'wouldn't be valid' mean precisely? Will accessing `d` cause undefined behavior (i. e. be illegal) or will the (still legally) accessed value of `d` be unspecified (while still being _some_ legal `double`)? Please also see my question edit. – Reizo Oct 02 '19 at 22:40
  • 1
    @Reizo C++ Standard simply doesn't specify anything about coping arbitrary set of bytes over an (even trivially copyable) type. UB is possible. In practice, there should be no problem unless the type has trap representations (such as the signaling NaN). – eerorika Oct 03 '19 at 00:21
3

Type punning is forbidden because the idea of it makes a mockery of the C++ object model. A piece of memory stores an object, and if you start accessing it as if it stored some other object, then what does that even mean? If you can just willy-nilly read from memory as an int, write to it as a float, and read from it later as a short, then what does it even mean to have an object exist?

Copying bytes between trivially copyable objects is just another way of setting that object's value. Indeed, that is what it logically means for an object to be "trivially copyable": that the meaning of that object is solely defined by the sequence of bytes that make up its object representation (this is not the case for complex objects). But the sanctity of what memory belongs to which objects is preserved. There is no "punning"; there is just copying data around.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 2
    If one were to specify a language "C+=2" which was just like C++ except that the creation of a valid pointer of any trivially-copyable type would implicitly create an object of that type at that address initialized with the contents of the associated storage, would there be any thing one could do in C++ which one couldn't be done the same way, using the same code, in "C++2"? Why does the "existence" of trivially-copyable objects have to mean anything? – supercat Oct 03 '19 at 05:34
  • Your answer make me feel that you think that "trivially copyable" means that a value of an object of trivially copyable type is uniquely determined by the value of bits in its value representation. But this is not true. – Language Lawyer Oct 03 '19 at 10:02
  • 1
    @LanguageLawyer: "*But this is not true.*" How not? How would you write a type which is trivially copyable such that whatever that object means to "have a value" is not determined entirely by its bit pattern? And I'm not talking about the behavior of `operator==`. Copying an object creates an equivalent object; if copying the bit pattern copies the object (which is what trivially copyable says), then copying the bit pattern creates an equivalent object, and therefore "equivalent object" is determined entirely by its bit pattern, regardless of what `operator==` may do. – Nicol Bolas Oct 03 '19 at 13:36
  • @NicolBolas: One could have a type `Woozle` whose bit pattern represents an index into a table of the values of all `Woozle` objects that have ever been created. This could be useful if e.g. `Woozle` values were large, but one knew that no more than 255 different values would ever be used within a single execution. – supercat Oct 03 '19 at 18:39
  • 1
    @supercat: And how would your object *ensure* that this bit pattern "represents an index into a table of values of all `Woozle` objects that have ever been created?" There's a difference between having a convention for what a value means and having the object actually *enforce that*. `unique_ptr` *enforces* its ownership semantics, by making its pointer private, by being move-only, and by having non-trivial special members that implement ownership. How would you write a `Woozle` that can *enforce* this in its copy constructor without that copy constructor being non-trivial? – Nicol Bolas Oct 03 '19 at 18:57
  • @NicolBolas [`bit_cast`](http://eel.is/c++draft/bit.cast) _Returns_ says _"Each bit of the value representation of the result is equal to the corresponding bit in the object representation of `from`... If there are multiple such values, which value is produced is unspecified."_ and `bit_cast` is defined only for trivially copyable types. There are fundamental types for which a bit pattern can mean different values. You can't not know them. – Language Lawyer Oct 04 '19 at 08:44
  • 1
    @LanguageLawyer: "*There are fundamental types for which a bit pattern can mean different values.*" I think you got that backwards; there are fundamental types for which different bitpatterns can represent the same value (positive/negative zero in IEEE-754, for example). But the reverse is not possible. – Nicol Bolas Oct 04 '19 at 13:29
  • A pointer past the end of an object could be bitwise identical to a pointer to an object. But these values are different. – Language Lawyer Oct 04 '19 at 15:04
  • @NicolBolas even simpler: a pointer to an object of class type with a non-static data member of reference type created on top of an object of the same type and pointer to the old object are 2 different values, but highly likely these pointers are bitwise identical. – Language Lawyer Oct 09 '19 at 23:10
1

There is a special exception in the standard for memcpy to/from a buffer of bytes, because certain operations would be impossible if there weren't a well defined way of doing that.

You can definitely get undefined behavior if you copy from one type, to bytes, then to another type.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622