7

Disclaimer: This is trying to drill down on a larger problem, so please don't get hung up with whether the example makes any sense in practice.

And, yes, if you want to copy objects, please use / provide the copy-constructor. (But note how even the example does not copy a whole object; it tries to blit some memory over a few adjacent(Q.2) integers.)


Given a C++ Standard Layout struct, can I use memcpy to write to multiple (adjacent) sub-objects at once?

Complete example: ( https://ideone.com/1lP2Gd https://ideone.com/YXspBk)

#include <vector>
#include <iostream>
#include <assert.h>
#include <inttypes.h>
#include <stddef.h>
#include <memory.h>

struct MyStandardLayout {
    char mem_a;
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
    char mem_z;

    MyStandardLayout()
    : mem_a('a')
    , num_1(1 + (1 << 14))
    , num_2(1 + (1 << 30))
    , num_3(1LL + (1LL << 62))
    , mem_z('z')
    { }

    void print() const {
        std::cout << 
            "MySL Obj: " <<
            mem_a << " / " <<
            num_1 << " / " <<
            num_2 << " / " <<
            num_3 << " / " <<
            mem_z << "\n";
    }
};

void ZeroInts(MyStandardLayout* pObj) {
    const size_t first = offsetof(MyStandardLayout, num_1);
    const size_t third = offsetof(MyStandardLayout, num_3);
    std::cout << "ofs(1st) =  " << first << "\n";
    std::cout << "ofs(3rd) =  " << third << "\n";
    assert(third > first);
    const size_t delta = third - first;
    std::cout << "delta =  " << delta << "\n";
    const size_t sizeAll = delta + sizeof(MyStandardLayout::num_3);
    std::cout << "sizeAll =  " << sizeAll << "\n";

    std::vector<char> buf( sizeAll, 0 );
    memcpy(&pObj->num_1, &buf[0], sizeAll);
}

int main()
{
    MyStandardLayout obj;
    obj.print();
    ZeroInts(&obj);
    obj.print();

    return 0;
}

Given the wording in the C++ Standard:

9.2 Class Members

...

13 Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. (...) Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; (...)

I would conclude that it is guaranteed that num_1 to num_3 have increasing addresses and are adjacent modulo padding.

For the above example to be fully defined, I see these requirements, of which I am not sure they hold:

  • memcpy must be allowed to write to multiple "memory objects" in this way at once, i.e. specifically

    • Calling memcpy with the target address of num_1 and a size that is larger than the size of the num_1 "object" is legal. (Given that num_1 is not part of an array.) (Is memcpy(&a + 1, &b + 1, 0) defined in C11? seems a good related question, but doesn't quite fit.)
    • The C++ (14) Standard, AFAICT, refers description of memcpy to the C99 Standard, and that one states:

    7.21.2.1 The memcpy function

    2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

    So for me the question here wrt. this is whether the target range we have here can be considered "an object" according to the C or C++ Standard. Note: A (part of an) array of chars, declared and defined as such, certainly can be assumed to count as "an object" for the purposes of memcpy because I'm pretty sure I'm allowed to copy from one part of a char array to another part of (another) char array.

    So then the question would be if it is legal to reinterpret the memory range of the three members as a "conceptual"(?) char array.

  • Calculating sizeAll is legal, that is usage of offsetof is legal as shown.

  • Writing to the padding in between the members is legal.

Do these properties hold? Have I missed anything else?

Community
  • 1
  • 1
Martin Ba
  • 37,187
  • 33
  • 183
  • 337
  • Don't use `memcpy()` or `std::copy()` for such situations. Provide copy constructors instead. – πάντα ῥεῖ Aug 18 '16 at 20:26
  • 7
    @πάνταῥεῖ - You know, I *did* include a disclaimer, *and* the question contains the [lang-lawyer] tag. This is a site for SW professionals. I don't need to include the "don't do this at home, kids" sign, or do I?? :-P – Martin Ba Aug 18 '16 at 20:28
  • 2
    The calculation of `delta` is undefined, as the pointers do not point into the same array. (Section 5.7, paragraph 6, last sentence.) – molbdnilo Aug 18 '16 at 20:30
  • @MartinBa _"Do these properties hold?"_ It never was appropriate with any standard, and that didn't change. Your tagging doesn't help. – πάντα ῥεῖ Aug 18 '16 at 20:31
  • @MartinBa I already referred to the C++11 standard. – molbdnilo Aug 18 '16 at 20:40
  • @MartinBa: What you wrote before was not "controversial"; it was not allowed by the standard. – Nicol Bolas Aug 18 '16 at 21:01

3 Answers3

2

§8.5

(6.2) — if T is a (possibly cv-qualified) non-union class type, each non-static data member and each base-class subobject is zero-initialized and padding is initialized to zero bits;

Now the standard does not actually say that these zero-bits will be writeable, but I can't think of an architecture that has this level of granularity on memory access permissions (nor would we want one to).

So I would say that in practice this re-writing zeros will always be safe, even if not specifically declared so by the Powers that Be.

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • Nice catch! I'm quite surprised the Standard explicitly defines the value of the padding. Grepping the PDF doesn't yield any further insight on the reason for me though. – Martin Ba Aug 18 '16 at 20:58
  • "*Now the standard does not actually say that these zero-bits will be writeable*" Sure it does. It's an inevitable outgrowth of being able to `memcpy` between trivially copyable types. – Nicol Bolas Aug 18 '16 at 21:00
  • @NicolBolas ah yes, you're right of course. This now begs the question of whether it is "defined behaviour" if any of these padding bits are given a non-zero value (for example as a result of memcpy). – Richard Hodges Aug 18 '16 at 21:04
  • @RichardHodges: `memcpy` is hardly the only way to give non-zero values to such bytes. Default initialization from raw memory allocations can do it too. Not unless you think that doing `new C` will blank out *just* the padding. Nothing in the standard *requires* that padding bytes have any particular value. – Nicol Bolas Aug 18 '16 at 21:07
  • @NicolBolas - I would agree in practice of course, but I'm not sure from a Std POV: The requirements of `memcpy` wrt. TC types do not imply anything about the padding, so either before or after, the padding value is not defined by `§3.9 : 2, 3`. It talks about "original value" and "same value" - I'm not sure that padding can be considered part of the value. (Again, theoretically speaking.) – Martin Ba Aug 18 '16 at 21:07
2

is legal to reinterpret the memory range of the three members as a "conceptual"(?) char array

No, arbitrary subsets of members of objects are not themselves an object of any kind. If you can't take the sizeof something, it's not a thing. Similarly, as suggested by the link you provided, if you can't identify the thing to std::is_standard_layout, it's not a thing.

Analogous would be

size_t n = (char*)&num_3 - (char*)&num_1;

It would compile, but it's UB: subtracted pointers must belong to the same object.

That said, I think you're in safe territory even if the standard isn't explicit. If MyStandardLayout is a standard layout, it stands to reason that a subset of it also is, even if it has no name and is not an identifiable type of its own.

But I wouldn't do it. Assignment is absolutely safe, and potentially faster than memcpy. If the subset is meaningful and has many members, I would consider making it an explicit struct, and using assignment instead of memcpy, taking advantage of the default member-wise copy constructor supplied by the compiler.

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
  • Thanks. Good take on that. See my too-long answer on my considerations wrt. this issue. – Martin Ba Aug 18 '16 at 22:58
  • "is a standard layout, it stands to reason that a subset of it also is" - I would say it is actually *mandated* by the rules for SL in the standard that "informal" composite sub-objects of a Standard Layout Type are identical with explicit composite sub-objects. (§9.2/13) Whether anything follows from this is another question though :-) – Martin Ba Aug 18 '16 at 23:04
1

Putting this as a partial answer wrt. memcpy(&num_1, buf, sizeAll):

Note: James' answer is much more concise and definitive.

I asked:

  • memcpy must be allowed to write to multiple "memory objects" in this way at once, i.e. specifically

    • Calling memcpy with the target address of num_1 and a size that is larger than the size of the num_1 "object" is legal.
    • The [C++ (14) Standard][2], AFAICT, refers description of memcpy to the [C99 Standard][3], and that one states:

    7.21.2.1 The memcpy function

    2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

    So for me the question here wrt. this is whether the target range we have here can be considered "an object" according to the C or C++ Standard.

Thinking and searching a bit more, I found in the C Standard:

§ 6.2.6 Representations of types

§ 6.2.6.1 General

2 Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

So at least it is implied that "an object" => "contiguous sequence of bytes".

I'm not so bold to claim that the inverse -- "contiguous sequence of bytes" => "an object" -- holds, but at least "an object" doesn't seem to be defined more strictly here.

Then, as quoted in Q, §9.2/13 of the C++ Standard (and § 1.8/5) seem to guarantee that we do have a contiguous sequence of bytes (including padding).

Then, §3.9/3 says:

3 For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1. [ Example:

T* t1p;
T* t2p;       
     // provided that t2p points to an initialized object ...         
std::memcpy(t1p, t2p, sizeof(T));  
     // at this point, every subobject of trivially copyable type in *t1p contains        
     // the same value as the corresponding subobject in *t2p

—end example ]

So this explicitly allows the application of memcpy to whole objects of Trivially Copyable types.

In the example, the three members comprise a "trivially copyable sub-object", and indeed I think wrapping them in an actual subobject of distinct type would still mandate exactly the same memory layout for the explicit object as for the three members:

struct MyStandardLayout_Flat {
    char mem_a;
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
    char mem_z;
};

struct MyStandardLayout_Sub {
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
};

struct MyStandardLayout_Composite {
    char mem_a;
    // Note that the padding here is different from the padding in MyStandardLayout_Flat, but that doesn't change how num_* are layed out.
    MyStandardLayout_Sub nums;
    char mem_z;
};

The memory layout of nums in _Composite and the three members of _Flat should be layed out completely the same, because the same basic rules apply.

So in conclusion, given that the "sub object" num_1 to num_3 will be represented by an equivalent contiguous sequence of bytes as a full Trivially Copyable sub-object, I:

  • have a very, very hard time imagining an implementation or optimizer that breaks this
  • Would say it either can be:
    • read as Undefined Behavior, iff we conclude that C++§3.9/3 implies that only (full) objects of Trivially Copyable Type are allowed to be be treated thusly by memcpy or conclude from C99§6.2.6.1/2 and the general spec of memcpy 7.21.2.1 that the contiguous sequence of the num_* bytes does not comprise a "valid object" for the purposes of memcopy.
    • read as Defined Behavior, iff we conclude that C++§3.9/3 does not normatively limit the applicability of memcpy to other types or memory ranges and conclude that the definition of memcpy (and the "object term") in the C99 Standard allows to treat adjacent variables as a single object contiguous bytes target.
Community
  • 1
  • 1
Martin Ba
  • 37,187
  • 33
  • 183
  • 337