1
#include <cstdlib>
#include <cstring>
#include <iostream>

// C++03 only.
int main()
{
    std::allocator<unsigned char> alloc;

    double d = 8;
    unsigned char* buf = alloc.allocate(sizeof(double));
    std::memcpy(buf, &d, sizeof(double));

    double extract;
    std::memcpy(&extract, buf, sizeof(double));

    std::cout << extract << '\n';

    alloc.deallocate(buf, sizeof(double));
}

I've created storage for an array of bytes that I've never brought to life by initializing it, so its lifetime never started (and in C++03 that is basically impossible to do since the correct way of doing it is by placement new but for arrays it adds a memory offset that you don't know what is it)

But, I don't actually need to "directly read" the data that is in this storage, I do all of this copy-paste of objects by means of std::memcpy calls.

My doubt is with the second std::memcpy call. In terms of value access, is it considered that the std::memcpy is reading the value of each buf array element so this array must be alive?

NOTE: It has been suggested that, this question is a possible clone of Can memcpy be used for type punning?. That other question is about C, and this question is not about type punning either, although it's related (I wrote originally a wrong question title because I misundertood what type punning actually means). I'm asking here about the precise semantics of the reading-from object that is passed to memcpy, whether that object must be alive or isn't actually required.

ABu
  • 10,423
  • 6
  • 52
  • 103
  • It is technically probably UB (not sure about the exact C++03 wording at the moment), but practically will probably work as expected on all compilers. As you note there is no alternative to create arrays without a direct allocating `new`. This issue was only resolved recently in the C++20 draft. However, why do you not placement-new a `double` into `buf`, before using it? – walnut Mar 14 '20 at 20:20
  • There is also no type-punning here. Do you intend to read `buf` through the `unsigned char*` pointer? – walnut Mar 14 '20 at 20:24
  • @walnut No, never, That's is what I trying to achieve. Skip object lifetimes and alignment requirements by memcpy read/writes. The idea is to create a buffer for serialization purposes with minimum waste of space. – ABu Mar 14 '20 at 20:38
  • @walnut Why do you think is UB? I think `memcpy` works in terms of storage not objects, and besides, memcpy doesn't actually need to "read" the stored value, just copy it in a blind way, so I think it's not UB but I'm not sure either. – ABu Mar 14 '20 at 20:39
  • After reading the standards again, it might be fine. `memcpy` is defined in the C standard to copy from one object to another, but both C99 and C++03 define an object as a region of storage, rather than just "occupying" a region of storage as in newer C++ standards. The question would come down to pedantic interpretation of the standards. I am not sure what the outcome would be. Pretty sure though that this is intended to have defined behavior (and definitively has in C++20). – walnut Mar 14 '20 at 20:54
  • @walnut So starting from C++11 that implementation could became UB? In which sense? – ABu Mar 14 '20 at 20:56
  • 1
    The questions are whether 1. `memcpy` on memory that does not contain any object technically has a defined behavior by the standard wording and 2. whether the guarantee that memcpying POD objects into an array of `unsigned char[]` of correct size and back into an object of same type results in the same value also applies when copying to allocated storage not containing any object. Both of these questions have no practical relevance though, because even if the standard text says that either is not defined, it would be considered a standard defect, I am pretty sure. – walnut Mar 14 '20 at 21:06
  • 1
    If you are really interested in these technical details of the standard wording, I would suggest adding the `language-lawyer` tag. If you want a practical answer, then it would probably be that it will work as expected but will be unnecessarily complicated given that you can reuse storage by placement-new (except for array types, as you noted). – walnut Mar 14 '20 at 21:10
  • *"there is no alternative to create arrays without a direct allocating `new`. This issue was only resolved recently in the C++20 draft"* Can you link to more information on what this means? I'm curious but I don't know what you're referring to. – jtbandes Mar 14 '20 at 21:31
  • 1
    @jtbandes Array placement-new requires an overhead of *unspecified* size in the allocated space. Therefore there is no way to allocate memory for it correctly (and technically it is UB, because a program has UB if it has UB in any realization of unspecified behavior). See e.g. [this question](https://stackoverflow.com/questions/8720425). The only alternative is to placement-new adjacent individual objects, but pointer arithmetic is only defined on arrays, so you cannot reach beyond the first element. The latter issue was resolved for C++20 draft: https://github.com/cplusplus/papers/issues/106. – walnut Mar 14 '20 at 21:41
  • @HongOoi That question is about C, not C++, and this question here is also not about actual type punning. OP is only copying object representations of the same type, not from one type to another. – walnut Mar 14 '20 at 21:44
  • Hmm. So if [\[basic.compound\]/3](https://timsong-cpp.github.io/cppwp/n4659/basic.compound#3) says every pointer value points at an object or function, or past the end of an object, or is null, or is invalid, what do allocation functions return? – aschepler Mar 14 '20 at 21:55
  • @aschepler I have that myself before and apparently noone really knows: https://stackoverflow.com/questions/58574379/are-pointers-to-allocated-memory-outside-objects-lifetime-invalid-pointers – walnut Mar 14 '20 at 21:57
  • How can you serialize something without accessing its contents? – user207421 Mar 14 '20 at 21:58
  • @aschepler I would say it returns a pointer to an object which has been not yet initialized (its lifetime has not started yet). – ABu Mar 14 '20 at 21:58
  • @user207421 I've changed the title AGAIN making it longer and longer to try to be precise. I don't know at which extent, semantically spoken, memcpy reads object values or rather just copies the contents of the passed storage. – ABu Mar 14 '20 at 22:02
  • 1
    @Peregring-lk [intro.object/1](https://timsong-cpp.github.io/cppwp/n4659/intro.object#1) lists the operations creating objects exclusively and the allocation functions are not in that list, so they cannot create objects. That has been changed only with the current C++20 draft changes I linked above in https://github.com/cplusplus/papers/issues/106. – walnut Mar 14 '20 at 22:02
  • The point is that `memcpy()` accesses the contents of the source, and the subsequent serialization via `<<` accesses the contents of `memcpy()`'s target. No free lunch here. – user207421 Mar 14 '20 at 22:22

1 Answers1

2

The std::allocator<unsigned char>::allocate is specified as calling ::operator new (C++03 lib.allocator.members/3). So this question is substantially similar to "constructing" a trivially-copyable object with memcpy , although without the attempt to alias a value afterwards.

If we replaced the call to memcpy with a char assignment loop: unsigned char *p = (unsigned char *)&d; for (int i = 0; i < sizeof d; ++i) buf[i] = p[i]; then it is definitely undefined behaviour, since the assignment operator only has defined behaviour when the left hand side refers to an object that exists. See this answer for more detail.

However for the memcpy version, the question is: is memcpy the same as this char assignment loop, or something else?

The C++03 standard only defines memcpy by deferring to ISO C90, which says:

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

But it is unclear how to interpret this, since C has a different object model to C++. In C "object" means storage, whereas in C++ "object" and "storage" mean different things, and in the code in this question there is storage with no objects.

The answer by Shafik Yaghmour therefore describes the situation as "unspecified", although I think "not specified" or "unclear" would be better descriptions, since the term "unspecified" has a specific meaning in C++.

Footnote: Nothing substantial changed on this topic as of C++17. But in C++20 this will be well-defined, accepted proposal detail.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • C++03 still, in contrast to later C++ iterations, defines "*An object is a region of storage.*" in [intro.object], as C does, although it continues to say that objects can only be created by means which do not include allocation functions. So it becomes extra confusing whether the pointer points to an object and whether the definition of `memcpy` applies. – walnut Mar 14 '20 at 23:08
  • 1
    @walnut in light of that contradiction it seems reasonable to disregard that sentence (any other choice of what to disregard would be worse) – M.M Mar 14 '20 at 23:14
  • 1
    perhaps we could also consider that the C++ standard provides no guarantee that bytes of storage containing no objects retain their "values" over time, so no guarantee that the reading back of data produces the original bytes – M.M Mar 14 '20 at 23:15
  • @M.M C++03 quote: `[basic.types]3 For any object (other than a base-class subobject) of POD typeT, whether or not the object holds a valid value of typeT, the underlying bytes making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value`. The problem is, what is meant by object in this particular sentence? Should the object be alive or with a pointer to storage is enough? – ABu Mar 15 '20 at 15:00
  • @Peregring-lk in C++ , storage with no objects is clearly not an array of char or unsigned char, since those are objects – M.M Mar 16 '20 at 00:51