C++ - String Copy task

Question

In one of a task for my preparation for an exam I still don't see through the pointer methods, I'm still at the very beginning of learning that (I only learned Java).

So the task is how many times will the string s be copied and where.

I think that in t1 the string will be copied because of the pointer to an address. I'm not sure though. Also I couldn't figure out what the &-Symbol after the string does.

In the following is the code:

#include <string>
using namespace std;
string t1(string z) { return z; }
string *t2(string &z) { return &z; }
string& t3(string *z) { return *z; }
string& t4(string& z) { return z; }
string t5(string &z) { return z; }

int main() {
  string s;
  t1(s);
  t2(s);
  t3(&s);
  t4(s);
  t5(s);
  return 0;
}

So what exactly is your question? Please edit your post and provide some clarification on how you would like us to help you. — Captain Obvlious, Sep 25 '15 at 10:34
This is really broad; what C++ book are you using? If you spend some time reading through it, you'll learn these basics of the language. — Lightness Races in Orbit, Sep 25 '15 at 11:39

score 0 · Answer 1 · edited May 23 '17 at 12:32

I think a good way to analyze this behavior is with a simple class like the one bellow:

class Test{
public:
    int id;
    Test(){ cout << "Constructor is called!  id:" << id << endl; }
    Test(const Test &obj){
        id = obj.id;
        cout << "Copy-Constructor is called!  id:" << id << endl;
    }
    ~Test(){ cout << "Destructor is called!  id:" <<  id << endl; }
};

One thing you need to pay careful attention to is that when passing objects to functions or as return values of functions instead of constructor another function named copy constructor is called to copy values from one object to another one safely and you can define it yourself, as you can see in my sample class. (To realize it really exists, you can omit this function from my class and test your f1 to see that there isn't any output for constructor but there is one for destructor.)

Now to answer your question I use this class instead of string. I also imagine that the object sent as function arguments has id==100 and also there is a z.id = 55; or z->id = 55; before return part in each function.

Test t1(Test z) { z.id = 55; return z; }

By calling this function you would see that the copy-constructor is outputted twice. Once it copies id==100 which is the parameter object and another one for return part with id==55. After these constructor calls we can see two destructor calls for id==55 as z.id is changed in the function.

Test *t2(Test &z) { z.id = 55; return &z; }
Test &t4(Test &z) { z.id = 55; return z; }

In these functions there won't be any constructor or destructor calls since you are working with references and pointers, so no new object is created neither for argument nor for return part.(By the way if you're not sure what the differences between these two are take a look at here.

Test &t3(Test *z) { z->id = 55; return *z; }

In this one there won't be any new objects either, but the difference is that since the return value is in the form of reference you are allow to return the value of object (*z instead of z), but if you use a reference

Test t5(Test &z)  { z.id = 55; return z;  }

Finally in this function a new object is created when you reach the return part.

score 0 · Accepted Answer · answered Sep 25 '15 at 12:14

Before I get to the meat, let's cover one slightly special thing about most string classes. String classes are usually implemented as a kind of smart pointer to the string's buffer. This means that:

std::string s1("testing");
std::string s2;

s2 = s1;

Although s2 is a unique string class, after the assignment s2 = s1, there is still only one string buffer between them. That buffer isn't copied, it is shared in a kind of read only arrangement. If a change is made to the string in s2, at that moment a copy is created so as to make the two strings point to different buffers.

Your question is probably not about the buffers themselves, but the string object which operates those buffers, but it's tangentially related in the case of strings (and, similarly, of std::shared_ptr for similar reasons) where copy performance is concerned. Copying a std::string class is often much less work than copying the underlying buffer.

That said, there's another point regarding your code sample that deserves addressing, and that's what is done with the return values from these functions (in part because you asked what the & does after the string in two of them).

Repeating with slight expansion:

#include <string>
using namespace std;
string t1(string z) { return z; }
string *t2(string &z) { return &z; }
string& t3(string *z) { return *z; }
string& t4(string& z) { return z; }
string t5(string &z) { return z; }

int main() {
  string s; string x; string *xp
  x  = t1(s);
  xp = t2(s);
  x  = t3(&s);
  x  = t4(s);
  x  = t5(s);
  return 0;
}

Now, it's important to expand on function t1 a moment. There's theory, and there's actual result, which differ in all modern C++ compilers. On an exam I'd expect you'd answer to pure theory, ignoring elided copies, which come into play here. Consider x = t1(s), where in theory s is copied as the parameter to the function, at which point z, within the function, is a copy of the s from the caller. The return is by value, so in theory a second copy is created to return. Then, in theory, another copy is performed as x is assigned. Now, that may also be what you witness if you trace through that in the debugger. However, in all but the most naive compilers, all of these copies will be elided, such that x will receive a copy of s as if x = s were written (and most compilers would examine this literal code, realize nothing is done, and emit a program that does nothing but return).

Now, about x = t2(s); The parameter is a reference to a string (these things are interpreted from right to left, so think reference to a string even though most speak "string reference". That means there's no copy used by the function, it is the caller's s. This function returns the address of that string, a pointer, which means no copy is made of s - at most we would say a copy of the pointer is returned. This is the same as having written xp = &s;

In x = t3(&s) we have a curious case. The function accepts a pointer, which requires &s to take the address of s to provide that pointer, and as such no copy of s is made at the function call. The function returns a reference to a string (read just as before, from right to left, though some might say a string reference). Since this is a dereference of a pointer, the result is just referring to s via it's address, and no copy is made in the return. This is further supported by the fact that the return is a reference. References are implemented as a pointer. It's a special kind of pointer, but under the hood, it's a pointer - no copy is made. However, since x is a unique object, a copy is made at the assignment from that reference in assigning x to it. It resolves to the same thing as having written x = s;

There are other usage case this function supports which deserves separate consideration:

string xr( t3( &s ) );

In this case the reference is used to initialize xr (the reference returned from t3). It's similar to string xr( s );. So far, not a revelation. However, consider using the returned string as compared to t2 and t1.

t1(s).length();
t2(s)->length();
t3(&s).length();

Here, the return from each function is used to call a member of string. The call with t1 has copied s into the function, then copied again to return the temporary string, which is then destroyed (a destructor will be called), which is a point you haven't really addressed in your inquiry.

The call with t2 and t3, however, are actually using s for the call without any copy implied. In the t2 case, however, the call is by pointer. The t2 case is like having written (oddly) (&s)->length(), whereas the t3 case is the same as having written s.length().

T4 is exactly the same thing as t3, only differing in how the call is made and the implication which is associated with the possibility that a nullptr might be passed to t3 (causing a crash at the dereference), which can't happen with t4.

T5 differs from t4 (and t3) only because a copy is implied due to the return by value. What is returned is like t1, operates like t1, and only differs from t1 by implying that t5 does not create a copy for operation with the function body, it just creates a copy for the return.

Assuming the example code you provided, appending main after the call to t5:

string a, b;

// t1 is like having written:

a = s;
b = a;
x = b;

// t5 is like having written:

b = s;
x = b;

Meaning, the first copy of t1 is eliminated by the fact t5 takes a reference instead of a value.

In modern C++ we generally ignore the peformance implication by theory in cases like t1 or t4, t5. We're more concerned with why a reference is used instead of a copy, because the side effect of using a reference is that changes made to the string within the function t5 is made to the caller's s, whereas a copy is implied in t1 and therefore the caller's s is not changed. That is an important component of your question.

Theory will always make a copy where a copy is implied by the writing, as detailed above, but in practice copies are elided (avoided) due to optimization. In the case of t1, for example, that literal code elides all implied copies - no copies would be performed. However, if a change were made to z within the function body of t1, that changes things. If a change is made to t1 the compiler realizes that the side effect of changing z would change s unless a copy is made, which means that one copy implied by the pass by value parameter of t1 would be created, to avoid that side effect, but still elide the copy implied by the return by value.

score 0 · Answer 3 · answered Sep 25 '15 at 12:19

I will just answer if at least one copy was necessary because of possible copy elision due to compiler optimisation:

t1 : a new copy is returned (different from passed string) : COPY
t2 : you get a reference and return a pointer: NO COPY
t3 : you take an address and return a reference: NO COPY - but could crash if pointer is null
t4 : you take a reference and return a reference : NO COPY
t5 : you take a reference and return a value : COPY

If there were no optimisation, t1 would need 2 copies: 1 to create a temporary from original string and another one to create the returned copy in caller scope, but only one can happen if there is elision

t5 only need one single copy to create the returned copy in caller scope

C++ - String Copy task

3 Answers3