2

In C++, is it good practice to initialize a variable by passing a reference to it into an "initialization" function? Or, to put it another way, it is good practice to write functions that behave this way (i.e. update variables created somewhere else)? In my intro programming class (taught in Java), we were taught to write methods like this as static and to give them explicit return values. But I've noticed from looking at a few samples that some C++ programmers declare their variables with no explicit initialization, hand them off to some function, then proceed to use them in the program. Are there any advantages/drawbacks for either style? (I'm excluding purely OO stuff like member functions and variables from this question - this isn't just about using methods to update an object's state. I've seen this done outside of classes in C++).

I wrote a few quick lines of code to illustrate what I mean. The first function genName() is the style I'm familiar with. The second, gen_name() is the kind I'm curious about.

string genName() {
    string s = "Jack" ;
    return s ;
}

void gen_name(string & s) {
    s = "Jill" ;
}

int main(int argc, const char * argv[]) {

    string name1 = genName() ;

    string name2 ;
    gen_name(name2) ;

    cout << name1 << endl ;
    cout << name2 << endl ;

    return 0;
}
AdamJames
  • 367
  • 2
  • 11

3 Answers3

2

The initialization-by-reference style used to be popular for initializing complex data types in C++98 which didn't provide the move constructor, and where return value optimization was not yet ubiquitously implemented.

For example, a function that creates and returns a large vector would be frowned-upon because it would in effect create a temporary vector, which would (lacking a compiler that reliably implements RVO) be copied to the target vector, along with all of its elements. This unnecessary local allocation and copying led some programmers and style guides to recommend initialization by reference style everywhere. Modern C++ addresses this complaint with the move constructor and std::move, so the initialization by reference pattern can be retired.

JoeG
  • 12,994
  • 1
  • 38
  • 63
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • This makes it sound like pass by reference is a good idea before C++11, which in most cases it isn't. – juanchopanza Dec 07 '13 at 08:18
  • Constructors have been a pain coming from Java. It's weird having to write a new constructor just to assign an object to a different variable. It should be the compiler's job to figure out how to do it. – AdamJames Dec 07 '13 at 08:24
  • @juanchopanza The answer doesn't claim that all pass-by-reference was preferred pre-C++11, only those where copying is a problem, which is typically for code that returns (potentially) large containers. – user4815162342 Dec 07 '13 at 08:25
  • It is not preferred even then. It's been a long time since RVO has worked on popular compilers such as g++, VS and clang. – juanchopanza Dec 07 '13 at 08:28
  • @juanchopanza Whether RVO worked would depend on optimization level and sometimes details of the compiler's [analysis of a concrete function](http://en.wikipedia.org/wiki/Return_value_optimization#Compiler_support). Because of this (and presumably support of legacy compilers) relying on RVO was in my experience often discouraged. – user4815162342 Dec 07 '13 at 13:05
1

The reason why the second option used to be popular is because of the overhead of expensive copying of objects such as std::string, std::map, etc. These objects, if copied, have the overhead of not only deep-copying the elements but also heap-allocations which can be expensive.

Having said that, with C++11 a lot of this goes away thanks to move semantics, and it allows us to do a few things that we couldn't do before.

For example, if you wanted your name to be a const object, this can be useful.

const std::string name = []() {
  std::string name;
  /* Fill in name. */
  return name;
}();

However, do note that initialize by reference is still useful in some cases. For example, the following code:

for (int i = 0; i < N; ++i) {
  const std::string name = gen_name(i);
  /* Use name here. */
}  // for

Even though it'd be nice to add the const if we know that we won't be modifying it, in terms of performance, the following would be faster.

std::string name;
for (int i = 0; i < N; ++i) {
  gen_name(i, name);
  /* Use name here. */
}  // for

EDIT:

The reason why I point out that initialization by reference may be preferred in some cases is because sometimes we can reuse a resource we acquired in a loop. In the above example, rather than constructing a new instance std::string on every iteration which would lead to a heap-allocation on every iteration, we can simply do a single heap-allocation at the beginning and keep reusing the same space.

mpark
  • 7,574
  • 2
  • 16
  • 18
  • What's the difference between the two parts of your answer? One part says "it's good", the other part says "it's bad", but the circumstances in both seem the same. – anatolyg Dec 07 '13 at 08:44
  • I edited the answer to include the explanation for that. I hope it helps :) – mpark Dec 07 '13 at 09:08
  • The point of the lambda hack is using RVO to avoid creating a copy? Otherwise it's not obvious why the lambda is needed instead of the simpler `const std::string const_name = name;`. – user4815162342 Dec 09 '13 at 13:04
  • @user4815162342 well, I don't want to put the logic for building the string inside of top-level function if I'm only going to use the function once. I'd also rather not build up the string and assign it like so: `const std::string name = name;` because I don't want `name` and `const_name` both in scope. – mpark Dec 09 '13 at 18:06
0

Passing by reference makes the code less readable - and code is more read than written.

The most important reason in C++ is that in Java every object is a reference, so results are cheap. In C++ an entire struct is returned possibly. A minor copying overhead.

But there are advantages to reference parameters:

Multiple results that would otherwise need an extra result type as container for several result values.

void divideAndRemainder(int p, int q, int& d, int& r)

Multiple results that would need preparation of input too.

void swapVariables(int& a, int& b);

Aliasing of fields filling either this or that field/variable.

struct link {
    struct link* next;
    int value;
}

// Ordered list insert, not possible like this in Java:
// Read "struct link*&", but for clarity I use explicit dereferencing here:
// *list.
void insert(struct link** list, int value) {
    while (*list && value < (*list)->value) {
        list = &(*list)->next;
    }
    struct link* next = *list;
    *list = new struct link();
    (*list)->value = value;
    (*list)->next = next;
}

(Mind - I am now an ingrained Java programmer.)

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Under normal circumstances, there is no copy made when returning by value, so there is no overhead to mitigate in the first place. – juanchopanza Dec 07 '13 at 08:39
  • @juanchopanza: though I was fully aware of that, thanks for clarifying as my formulation "minor copying overhead" is too unprecise. – Joop Eggen Dec 07 '13 at 08:54
  • Giving it to you for the "multiple results" angle, which I'd never considered. The first time I ever tried to write an OO program for homework I wondered why there aren't functions that can return multiple values. I realized later that was a stupid thought. I could see the updating multiple things at once idea coming in useful though. – AdamJames Dec 07 '13 at 09:07
  • 3
    My preferred way to handle multiple results in C++11 is to return a `std::tuple<>` and assign it to a `std::tie()`, like so: `std::tuple divide(int x, int y);` then do: `std::tie(quotient, remainder) = divide(x, y);`. – mpark Dec 07 '13 at 09:14