6

Section 19.3 presents a string representation in a chapter whose main focus is operator overloading, specifically the special operators [] , ->, and (). It implements copy_from() as an ancillary function as follows:

void String::copy_from(const String &x)
    // make *this a copy of x
{
    if (x.sz <= short_max)
    {
        memcpy(this, &x, sizeof(x);
        ptr = ch;
    }
    else
    {
        ptr = expand(x.ptr, x.sz+1);
        sz = x.sz;
        space = 0;
    }
}

The class interface looks like this:

#ifndef STRING_EXERCISE_H
#define STRING_EXERCISE_H

namespace simple_string
{
    class String;
    char *expand(const char *ptr, int n);
}

class String
{
    public:
        String(); // default constructor x{""}
        explicit String(const char *p); // constructor from C-style string

        String(const String &s); // copy constructor
        String &operator=(const String& s); // copy assignment
        String(String &&s) // move constructor
        String &operator=(String &&s) // move assignement

        ~String() // destructor

        char &operator[](int n); // unchecked element access
        char operator[](int n) const;
        char &at(int n); // checked element access
        char at(int n) const;

        String &operator+=(char c) // add char c to the end

        const char *c_str(); // c-style string access
        const char *c_str() const;

        int size() const; // number of elements
        int capacity() const; // elements plus available space

    private:
        static const short short_max = 15;
        int sz;
        char *ptr;
        union
        {
            int space; // unused allocated space
            char ch[short_max+1]; // leave space for terminating 0
        };

        void check(int n) const; // range check
        void copy_from(const String &x);
        void move_from(String &x);
}

#endif

How can String::copy_from() use memcpy() to copy the class? I thought the class copied had to be trivially copyable (which it is not, because String has user-defined constructors, copy operations, move operations, and destructors).

dav
  • 626
  • 4
  • 21
  • 1
    The class copied from is the exact same class being copied to. You are allowed to take advantage of implementation knowledge inside your own class – Joe Jul 17 '18 at 16:48
  • *If the objects are not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined.* In this case it should be fine as the author takes care of what is copied incorrectly. – NathanOliver Jul 17 '18 at 16:48
  • @NathanOliver Isn't it [outright undefined](https://stackoverflow.com/a/29777728/2752075), regardless of the class contents? – HolyBlackCat Jul 17 '18 at 16:51
  • 2
    Sorry @Joe could you elaborate? I don't understand how that would change the fact that the standard says it's undefined to `memcpy()` non-trivially copyable types. – dav Jul 17 '18 at 16:55
  • @NathanOliver if "the author takes care of what is copied incorrectly" does that mean this code is not valid? – dav Jul 17 '18 at 16:55
  • @HolyBlackCat Does that post say it is undefined behavior? The very first part of the answer is*"Why would the behavior of std::memcpy itself be undefined when used with non-TriviallyCopyable objects?" It's not!* – NathanOliver Jul 17 '18 at 17:20
  • @NathanOliver Fair point. But still, the lifetime of destination object ends after you memcpy into it (unless it's trivially copyable), and using it after that without reconstructing it with placement-new is UB, and OP's code has no placement-new... – HolyBlackCat Jul 17 '18 at 17:23
  • @NathanOliver Now read the remainder of that answer, which says that using the target object after the `memcpy` is undefined. – interjay Jul 17 '18 at 17:24
  • @interjay I saw that but it was missing the quote to actually say that reusing the storage ended the lifetime. Looking at the standard I'd have to say tat this is UB. – NathanOliver Jul 17 '18 at 17:29
  • @NathanOliver A comment there refers to [basic.life]/1.4 which says "The lifetime of an object o of type T ends when [...] the storage which the object occupies is released, or is reused by an object that is not nested within o". Although it isn't 100% clear what "reusing" means exactly. – interjay Jul 17 '18 at 17:34
  • @HolyBlackCat "_the lifetime of destination object ends after you memcpy into it (unless it's trivially copyable),_" Unless the object is 100% made from trivially copyable members (no padding)! – curiousguy Jul 17 '18 at 18:06

2 Answers2

5

How can String::copy_from() use memcpy() to copy the class?

int, char, and the anonymous union are all trivially copyable. So while you cannot perform a memcpy of a String, you can perform a memcpy of its members. All of them, all at once. The technically correct code for this would be:

memcpy(&this->sz, &x.sz, sizeof(x));

That copies the range of memory of the storage for the members of this object.That is guaranteed by the rules of standard layout types. For standard layout types, the members are stored in definition order. So if you start with the first, and cover the range of all objects, that should copy the members.

However, the standard also says that the first member subobject of a standard layout type is required to have the same address as the object itself:

If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member.

That means &this->sz must be the same address as this, and &x.sz must be the same address as &x.

So just cut out the middle-man:

memcpy(this, &x, sizeof(x));

This is only allowed because of the rules of standard layout types.


A bigger issue is that copy_from never checks for self assignment. memcpy doesn't work with overlapping memory ranges. Maybe operator= and similar functions check for this already.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Do you think this is a good way to copy the members of a class? – dav Jul 17 '18 at 23:48
  • I agree that the "trick" works (see my bashed answer below). But is it really necessary? he could have assigned the size, reset the pointer and memcpy the union. Is that worst than memcopying everything to later reassign the ptr anyway? – alfC Jul 18 '18 at 06:37
  • 3
    @alfC: Necessary? No. Simple and manageable? Yes. C++ is a low-level language, so doing low-level things is not unreasonable. Assigning the variables requires naming all of them (so you'd have to remember to change it if you add new ones or change names). – Nicol Bolas Jul 18 '18 at 13:26
-1

I think what you are missing is that code is showing something called the short-string-optimization that allows the string to be stored in some buffer memory that is part of the object.

So, yes, for short strings the whole class can be memcopied because that is all that there is for the identity of the object. I wouldn't have done it this way, but the concept is that for short string you can copy element by element the contents of the class. (and reassign the ptr).

I would have done it this way and avoid the discussion, and I don't think is less efficient.

if (x.sz <= short_max)
{
    sz = x.sz;
    ptr = ch;
    memcpy(&ch, &x.ch, sizeof(ch));
}
alfC
  • 14,261
  • 4
  • 67
  • 118
  • 3
    [This post](https://stackoverflow.com/a/29777728/2752075) says it's UB... – HolyBlackCat Jul 17 '18 at 16:53
  • 1
    @HolyBlackCat, I would say that there is a "runtime" definition of what trivially copyable means. If `x.sz <= short_max` then (the instance) is trivially copyable and that is why he can use `memcpy`. Having said that, I would have done it differently and I am not assessing the quality of the code. I am trying to give a practical answer to the OP that make him understand the what is being done. Also, I don't have the book handy so I don't know exactly how `copy_from` is called, presumably after `clear`ing the original string (if it was long). – alfC Jul 17 '18 at 17:01
  • 2
    I agree that in practice it's probably fine. But formally it's UB. :/ – HolyBlackCat Jul 17 '18 at 17:02
  • I think the UB referes to *unconditionally* memcpying a class. If you know something about the state of the class then it should be fine(r). (Code not executed don't lead to UB IMO.) – alfC Jul 17 '18 at 17:04
  • 1
    Well, the standard doesn't seem to say anything about it being unconditional affecting undefined-ness. – HolyBlackCat Jul 17 '18 at 17:05
  • Even if `x.sz <= short_max`, isn't the class still not trivially copyable, because there are user defined constructors, copy operations, etc. ? – dav Jul 17 '18 at 17:07
  • @DavidTran, I think there are refinements of every concept applied to classes that can be translated to runtime instances. In this case I would say that the class is not trivially copyable, but the instance (inside the conditional block) is trivially copyable. He could have named the union and have the code `if(...){sz = x.sz; ptr=x.ptr; u = x.u /* or memcpy(&x.u, &u, sizeof(...))?*/;}` but then we would be discussing whether it is ok to assign or memcpy unions. I think this would have pleased more (not all?) people. – alfC Jul 17 '18 at 17:13
  • 1
    @DavidTran, I am not a laywer but see the language used in "If the **objects** are not TriviallyCopyable, the behavior of memcpy is not specified and **may be** undefined". (Emphasis mine.) Note that it talks about "object" and not classes, also that the phrase is conditional ("may be") implies some sort of runtime condition for it to be actually UB. – alfC Jul 17 '18 at 17:35
  • 1
    @alfC cppreference which you are quoting is not an official reference. And in any case just being unspecified is enough to make this non-portable at best. – interjay Jul 17 '18 at 17:38
  • @alfC Thanks for specifying. Now I don't understand why Stroustrup would use this seemingly convoluted way of copying members with `memcpy()` instead of copying each member individually. Is it more efficient? – dav Jul 17 '18 at 17:40
  • @DavidTran, I doubt it is more efficient (specially because he later reassigns the `ptr` anyway. Maybe I am too picky but I it is not the first time I see sloppy code coming from Stroustrup. The only reason that I have for this code is that it saves a few lines of code. Maybe looking that the actual implementation of `std::string` in any STL version would tell you a good/better alternative code. I personally never liked the short string optimization. (I think if should be a separate class). – alfC Jul 17 '18 at 17:47
  • From a std POV: there is no std POV. Lifetime is ill defined in C++. "Object" is ill defined. Everything at a fundamental level is ill defined. – curiousguy Jul 17 '18 at 18:09