why does builtin assignment return a non-const reference instead of a const reference in C++?

Question

(note the original question title had "instead of an rvalue" rather than "instead of a const reference". One of the answers below is in response to the old title. This was fixed for clarity)

One common construct in C and C++ is for chained assignments, e.g.

    int j, k;
    j = k = 1;

The second = is performed first, with the expression k=1 having the side effect that k is set to 1, while the value of the expression itself is 1.

However, one construct that is legal in C++ (but not in C) is the following, which is valid for all base types:

    int j, k=2;
    (j=k) = 1;

Here, the expression j=k has the side effect of setting j to 2, and the expression itself becomes a reference to j, which then sets j to 1. As I understand, this is because the expression j=k returns a non-const int&, e.g. generally speaking an lvalue.

This convention is usually also recommended for user-defined types, as explained in "Item 10: Have assignment operators return a (non-const) reference to *this" in Meyers Effective C++(parenthetical addition mine). That section of the book does not attempt to explain why the reference is a non-const one or even note the non-constness in passing.

Of course, this certainly adds functionality, but the statement (j=k) = 1; seems awkward to say the least.

If the convention were to instead have builtin assignment return const references, then custom classes would also use this convention, and the original chained construction allowed in C would still work, without any extraneous copies or moves. For example, the following runs correctly:

#include <iostream>
using std::cout;

struct X{
  int k;
  X(int k): k(k){}
  const X& operator=(const X& x){
  // the first const goes against convention
    k = x.k;
    return *this;
  }
};

int main(){
  X x(1), y(2), z(3);
  x = y = z;
  cout << x.k << '\n'; // prints 3
}

with the advantage being that all 3 (C builtins, C++ builtins, and C++ custom types) all are consistent in not allowing idioms like (j=k) = 1.

Was the addition of this idiom between C and C++ intentional? And if so, what type of situation would justify its use? In other words, what non-spurious benefit does does this expansion in functionality ever provide?

I think that `(j=k) = 1;` is a consequence of generalizing the rules, rather than an intentional feature. I couldn't imagine seeing code like that used in any practical way. There might be some benefit to `operator=` returning lvalues in that you can pass the result of an assignment by reference to a function, but I don't have any specific examples of that offhand. — Silvio Mayolo, Aug 10 '17 at 18:39
Are you okay with a because the standard says so answer or are you looking for why the standard says so? — NathanOliver, Aug 10 '17 at 18:51
NathanOliver, I can't parse what you're asking; can you rephrase? — xdavidliu, Aug 10 '17 at 19:01
@xdavidliu Are you okay with a answer that says the reason it is allowed is because says that it is, or are you looking for an answer that says why that section of the standard says it has to return a lvalue reference. — NathanOliver, Aug 10 '17 at 19:03
ah okay, in this case I would definitely like to know *why* the standard says so. Since the feature is present for C++ but not C, I am guessing it is a conscious decision, therefore there should be some situation where it is justified. — xdavidliu, Aug 10 '17 at 19:05
The title of your question is misleading. The constness of the returned reference and whether it returns an lvalue or an rvalue are different questions. — kraskevich, Aug 10 '17 at 19:11
@kraskevich I'm vaguely aware that there are exceptions to the naive definition of lvalues and rvalues, but I thought those exceptions don't really apply to builtins, e.g. for builtins like int and double, lvalue is completely synonymous with non-const reference, e.g. "something that you can put on the left side of an assignment"? — xdavidliu, Aug 10 '17 at 19:15
The more and more I look at it it looks like the reason they differ is because C has no other choice but to return a rvalue. There are no references in C so C can't return a lvalue reference like C++ can. And returning a reference is generally a performance neutral/gain. — NathanOliver, Aug 10 '17 at 19:15
@xdavidliu "The left side of the assignment" is a bad analogy. I prefer the following simplification: if it has a name, it's an lvalue. If it doesn't, it's an rvalue. — kraskevich, Aug 10 '17 at 19:31
@kraskevich The first half of that is correct, but [a lot of things](http://en.cppreference.com/w/cpp/language/value_category#lvalue) (like pre-increment, dereference (with `*`, `->`,`.*`, or `->*`), subscripting, several function call results, comma operators, ternary expressions, and (as in this question) assignment) are lvalues without names. The always-correct rule which is about as simple is “something is an lvalue iff you can take its address”. — Daniel H, Aug 10 '17 at 19:41
The current title of the question doesn't make much sense. Const references and lvalues are not the opposite. It's like asking why one would buy meat instead of apples. — kraskevich, Aug 10 '17 at 20:06
@kraskevich I don't think the fact that the two things aren't total opposites greatly takes away from the merit of the question, but just in case other people finds this fact glaringly distracting, I'll change it — xdavidliu, Aug 10 '17 at 20:08
I just thought of asking this question today, and here you asked it last week. :-) I find the answer you chose unsatisfying. :-/ — Omnifarious, Aug 14 '17 at 17:19

AnT stands with Russia · Accepted Answer · 2019-03-16T16:48:27.133

5

By design, one fundamental difference between C and C++ is that C is an lvalue-discarding language and C++ is an lvalue-preserving language.

Before C++98, Bjarne had added references to the language in order to make operator overloading possible. And references, in order to be useful, require that the lvalueness of expressions be preserved rather than discarded.

This idea of preserving the lvalueness wasn't really formalized though until C++98. In the discussions preceding the C++98 standard the fact that references required that the lvalueness of an expression be preserved was noted and formalized and that's when C++ made one major and purposeful break from C and became an lvalue preserving language.

C++ strives to preserve the "lvalueness" of any expression result as long as it is possible. It applies to all built-in operators, and it applies to built-in assignment operator as well. Of course, it is not done to enable writing expressions like (a = b) = c, since their behavior would be undefined (at least under the original C++ standard). But because of this property of C++ you can write code like

int a, b = 42;
int *p = &(a = b);

How useful it is is a different question, but again, this is just one consequence of lvalue-preserving design of C++ expressions.

As for why it is not a const lvalue... Frankly, I don't see why it should be. As any other lvalue-preserving built-in operator in C++ it just preserves whatever type is given to it.

edited Mar 16 '19 at 16:48

answered Aug 10 '17 at 20:21

AnT stands with Russia

312,472
42
525
765

interesting. Also, in what situation would `(a=b)` return an lvalue, but `(a=b) = c` not behave in the expected way? The few modern compilers that I have tried this on seem to work without a hitch. Does the original C++ standard not require evaluating `(a=b)` as something that can be assigned to? – xdavidliu Aug 10 '17 at 20:26
@xdavidliu: Under the "classic" C++98 specification, there's no sequencing in `(a = b) = c`. Under those old rules, this expression does not really ask the compiler to do `a = b` first and `a = c` second, as it might seem at the first sight. In reality these are just two unsequenced attempts to modify `a`, which makes the behavior undefined. – AnT stands with Russia Aug 10 '17 at 20:31
However, the truth is that the original spec is OK in lvalue-discarding context (in C), but makes no sense (is defective) under in lvalue-preserving context (in C++). This is essentially what triggered the comlpete redesign of C++ sequencing model in C++11. And in modern C++ `(a = b) = c` is well-defined. This is why you see the compilers to behave "as expected". – AnT stands with Russia Aug 10 '17 at 20:32
"As for why it is not a const lvalue... Frankly, I don't see why it should be." if I'm not mistaken, a const reference (which is what was suggested by the question) would still be an rvalue, not an lvalue – xdavidliu Aug 10 '17 at 20:32
@xdavidliu: That is false. A "classic" reference is always an lvalue, regardless of whether it is `const` or not. – AnT stands with Russia Aug 10 '17 at 20:33
I may be misunderstanding something here, but going by the definition of an lvalue as "something that we can take the address of", if we try `X *p = &(y = z);` in the example at the end of the question, the code doesn't compile, because it is trying to take the address of returned const reference, and hence `(y = z)` is *not* an lvalue – xdavidliu Aug 10 '17 at 20:37
@xdavidliu: Yes, lvalue is something that we can take the address of (not 100% precise, but good enough). Your code with `X` does not compile simply because it violates basic const-correctness. You need `const X *p = &(y = z);`. This will compile perfectly fine. The error has nothing to do with const reference's allegedly "not being an lvalue". `(y = z)` *is* an lvalue. In your case it just happens to be a const-qualified one. – AnT stands with Russia Aug 10 '17 at 21:03
Why was the decision made to make C++ and lvalue preserving language if not to enable the creation of very confusing expressions involving assignment? And when does that decision date from? I've been using C++ since cfront, and I've never heard of this. – Omnifarious Aug 15 '17 at 00:05
The decision was made because C++ needed complete fundamental reworking of how it handled lvalues: the language wanted to introduce a new concept - *references* (whose introduction was in turn triggered by *overloadable operators*). This allowed C++ to handle, pass and return lvalues *directly* (instead of doing it through pointers). This is what resulted in major redesign of built-in operators in C++, making them completely different from their superficial C counterparts. The decision dates from the very first C++ standard - C++98, i.e. C++ language was like that from day one. – AnT stands with Russia Aug 15 '17 at 07:50
Although one might say that the original specification was incomplete becauase it introduced serious defects in sequencing (lvalue-preservation is incompatible with C-style sequencing). It was not until C++11 when these defects were finally fixed. How come you never heard about it - I don't know. – AnT stands with Russia Aug 15 '17 at 07:53
@AnT - I don't know either, probably because the world wasn't as connected a place and I wasn't much of a regular in `comp.lang.c++` when the C++98 standard was worked on. But now I have heard of it. Thank you. That's the answer I really wanted to this question, and it makes a lot of sense. I did know that references were a required addition to make operator overloading work. And putting it in that context makes a whole lot of sense. Again, thank you. I'm tempted to edit your comment into your main answer. :-) – Omnifarious Aug 23 '17 at 22:48
Disagree with the first paragraph - it's not a particuarly fundamental difference, we are only talking about value category of the result of the assignment and conditional operators. This almost never comes up in real coding – M.M Aug 23 '17 at 23:11
@M.M - Why also conditional? That seems rather odd. – Omnifarious Aug 23 '17 at 23:22
1

@Omnifarious It's quite useful to have the conditional be able to give an lvalue, e.g. `(foo ? a : b) = 5;`, or `func(foo ? a : b)` where `func` takes argument by non-const reference. This isn't really new functionality, because you could have written `*(foo ? &a : &b) = 5;` but it seems tidy – M.M Aug 23 '17 at 23:29
@M.M: Well, it also includes comma operator and prefix increment/decrement. And even though it might not have looked fundamental at first, the contradictions between the original C approach to sequencing and C++ lvalue-preserving operators is what triggered the global redesign of C++ sequencing system. Which is quite a fundamental change. – AnT stands with Russia Aug 25 '17 at 23:01
I thought the C++11 sequencing change was about threading support (C11 followed the same model) – M.M Aug 26 '17 at 11:17

kraskevich · Answer 2 · 2017-08-10T19:26:01.053

1

I'll answer the question in the title.

Let's assume that it returned an rvalue reference. It wouldn't be possible to return a reference to a newly assigned object this way (because it's an lvalue). If it's not possible to return a reference to a newly assigned object, one needs to create a copy. That would be terribly inefficient for heavy objects, for instance containers.

Consider an example of a class similar to std::vector.

With the current return type, the assignment works this way (I'm not using templates and copy-and-swap idiom deliberately to keep the code as simple as possible):

class vector {
     vector& operator=(const vector& other) {
         // Do some heavy internal copying here.
         // No copy here: I just effectively return this.
         return *this;
     }
};

Let's assume that it returned an rvalue:

class vector {
     vector operator=(const vector& other) {
          // Do some heavy stuff here to update this. 
          // A copy must happen here again.
          return *this;
      }
};

You might think about returning an rvalue reference, but that wouldn't work either: you can't just move *this (otherwise, a chain of assignments a = b = c would run b), so a second copy will also be required to return it.

The question in the body of your post is different: returning a const vector& is indeed possible without any of the complications shown above, so it looks more like a convention to me.

Note: the title of the question refers to built-ins, while my answer covers custom classes. I believe that it's about consistency. It would be quite surprising if it acted differently for built-in and custom types.

edited Aug 10 '17 at 19:26

answered Aug 10 '17 at 19:23

kraskevich

18,368
4
33
45

okay, but the title of the question referred to builtins, not user-defined objects? – xdavidliu Aug 10 '17 at 19:25
@xdavidliu I believe it's about consistency. I wouldn't want built-ins and custom types to behave differently. – kraskevich Aug 10 '17 at 19:29
in terms of consistency, the current convention results in C++ builtins being consistent with C++ custom types, both of which are *in*consistent with C. However, if the convention were to return const references, then all three would be consistent, since all three would not allow expressions like `(j = k) = 3`. – xdavidliu Aug 10 '17 at 19:43
@ kraskevich I included in the original question an example of a const ref working, as you noted. In regards to consistency, why not have all 3 consistent (C builtins == C++ builtins == C++ custom) instead of just 2 (C builtins != C++ builtins == C++ custom)? – xdavidliu Aug 10 '17 at 19:57
@xdavidliu Why should it be consistent with C? C is not a subset of C++. It's a completely different language. They have nothing to do with each other. You wouldn't expect it to be consistent with, say, Java, would you? – kraskevich Aug 10 '17 at 20:03
2

in terms of built-in types like ints, doubles, arrays, pointers, etc. much of the semantics of C directly carry over to C++, since C++ was originally built on top of C, and not on top of Java or any other language. C++ is a very different language with tons of new features of course, but compatibility with C arguably was of non-zero concern during the design of C++. Any change of idiom should have a compelling reason. The difference here only seems to allow unappealing idioms as the one I provided, which doesn't seem to constitute a compelling reason for the difference, hence my question. – xdavidliu Aug 10 '17 at 20:11
@xdavidliu as AnT pointed out, why bother to make `operator =` a special case in this way? The difference is largely inconsequential anyway. It enables you to write C++ programs that can't be compiled in C, but that's pretty easy to do anyway. – Omnifarious Aug 23 '17 at 23:02

score 0 · Answer 3 · answered Aug 23 '17 at 23:26

Built-in operators don't "return" anything, let alone "return a reference".

Expressions are characterized mainly by two things:

their type
their value category.

For example k + 1 has type int and value category "prvalue", but k = 1 has type int and value category "lvalue". An lvalue is an expression that designates a memory location, and the location designated by k = 1 is the same location that was allocated by the declaration int k;.

The C Standard only has value categories "lvalue" and "not lvalue". In C k = 1 has type int and category "not lvalue".

You seem to be suggesting that k = 1 should have type const int and value category lvalue. Perhaps it could, the language would be slightly different. It would outlaw confusing code but perhaps outlaw useful code too. This is a decision that's hard for a language designer or design committee to evaluate because they can't think of every possible way the language could be used.

They err on the side of not introducing restrictions that might turn out to have a problem nobody foresaw yet. A related example is Should implicitly generated assignment operators be & ref-qualified?.

One possible situation that comes to mind is:

void foo(int& x);

int y;
foo(y = 3);

which would set y to 3 and then invoke foo. This wouldn't be possible under your suggestion. Of course you could argue that y = 3; foo(y); is clearer anyway, but that's a slippery slope: perhaps increment operators shouldn't be allowed inside larger expressions etc. etc.

why does builtin assignment return a non-const reference instead of a const reference in C++?

3 Answers3

Related