22

I'm puzzled with this behavior of C++:

struct A {
   virtual void print() const { printf("a\n"); }
};

struct B : public A {
   virtual void print() const { printf("b\n"); }
};

struct C {
   operator B() { return B(); }
};

void print(const A& a) {
   a.print();
}

int main() {
   C c;
   print(c);
}

So, the quiz is, what is the output of the program - a or b? Well, the answer is a. But why?

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
jj99
  • 300
  • 1
  • 11
  • 4
    It prints `b` on my machine. Also, `void main()`? Barf. – Carl Norum Jan 12 '13 at 22:52
  • 3
    `main` returns `int`, dude – Lightness Races in Orbit Jan 12 '13 at 22:53
  • 3
    @Carl: What compiler? see e.g. http://ideone.com/4W7qIa (which is GCC 4.3.4). GCC 4.5.1 also prints "a", as done VS2010. – Oliver Charlesworth Jan 12 '13 at 22:53
  • I used clang. I guess that's the difference. Standard version? Or undefined behaviour? I don't know C++ that well, I'm afraid. – Carl Norum Jan 12 '13 at 22:54
  • @OliCharlesworth: clang++ 3.1, for example. Although g++ 4.2 and 4.6 do print `a` (and to be honest I cannot find an explanation for it) – David Rodríguez - dribeas Jan 12 '13 at 22:56
  • 1
    My output is `b` with Apple clang version 4.1 (Xcode 4.2) on Mac OS X. – Martin R Jan 12 '13 at 22:56
  • Output _should be_ `b` as far as I can tell – dry run, not compiled... – ildjarn Jan 12 '13 at 22:57
  • 1
    Ok, this is interesting. We seem to have a bug in at least one compiler, on a pretty trivial test-case. My money is on clang being correct, as I can't see a good reason for `a` being the right answer. – Oliver Charlesworth Jan 12 '13 at 22:58
  • Looks like a bug in the gcc family of compilers... if instead of passing `c` directly you bind the variable directly it works: `C c; A const& x = c; print(x);` it prints the expected `b` – David Rodríguez - dribeas Jan 12 '13 at 23:00
  • @DavidRodríguez-dribeas: And VC++... – Oliver Charlesworth Jan 12 '13 at 23:01
  • @Oli : Bugs in VC++? Unheard of..! ;-] – ildjarn Jan 12 '13 at 23:02
  • 1
    *Well, the answer is a.* For me, it depends on which compiler I use. With some I get *a*, others *b*. I don't get any warnings no matter how high I crank up the warning options. – David Hammen Jan 12 '13 at 23:02
  • I'm confused that `print(c)` even compiles. I don't see a `print()` function which takes a parameter with a type that `c` could be converted to. Perhaps it should be `print(c())` in order to invoke the `operator()` overload? – Code-Apprentice Jan 12 '13 at 23:03
  • @David Do you *expect* a warning? I don’t, but I expect “b” as the output. – Konrad Rudolph Jan 12 '13 at 23:03
  • 1
    @Code-Guru: A `const A&` can be bound to a `B`... – Oliver Charlesworth Jan 12 '13 at 23:04
  • @David It doesn’t on GCC 4.7.2, it still prints “a”. – Konrad Rudolph Jan 12 '13 at 23:04
  • @Konrad - I get "a" with multiple versions of gcc, "b" with multiple versions of clang. – David Hammen Jan 12 '13 at 23:07
  • @Code-Guru - *I don't see a print() function which takes a parameter with a type that c could be converted to.* `C::operator B()` "converts" a `C` to a `B`. – David Hammen Jan 12 '13 at 23:08
  • @DavidHammen So the `operator()` is called implicitly? That makes sense now that I think about it. – Code-Apprentice Jan 12 '13 at 23:10
  • 2
    @Code-Guru: That is **not** `operator()`, but a *conversion operator to B*. Note the difference: `B operator();` vs `operator B()`. – David Rodríguez - dribeas Jan 12 '13 at 23:13
  • @DavidRodríguez-dribeas Thanks for clearing up my confusion. I haven't done C++ in a while and had forgotten about this syntax. With that cleared up, I agree with everyone that says that the output should be `b`. – Code-Apprentice Jan 12 '13 at 23:15
  • 2
    There is slicing happening by a _copy-initialization_ resulting in `"a"`, which if elided results in `"b"`. This is covered by _[5.8.3]/5_ but I cant figure it out... – K-ballo Jan 12 '13 at 23:17
  • @K-ballo : There is no slicing here because the only temporary is bound to a const-reference. – ildjarn Jan 12 '13 at 23:19
  • 1
    It prints "b" if you change the prototype of operator B() to: operator const B & () – StackHeapCollision Jan 12 '13 at 23:19
  • 1
    The gcc family thinks slicing is involved, clang doesn't. You can see this by adding a public default constructor and protected copy constructor to struct A. Now the code won't compile with gcc, but will with clang. – David Hammen Jan 12 '13 at 23:20
  • @ildjarn: The standard reads _"Otherwise, a temporary of type “cv1 T1” is created and initialized from the initializer expression using the rules for a non-reference copy-initialization. The reference is then bound to the temporary."_... would you help me parse that? – K-ballo Jan 12 '13 at 23:21
  • @K-ballo : `T1` in this context is `B`, not `A` – so the `B` temporary is bound to an `A const&`. – ildjarn Jan 12 '13 at 23:26
  • @ildjarn: `T1` is `A`, that paragraph starts with _"A reference to type “cv1 T1” is initialized by an expression of type “cv2 T2” as follows"_ – K-ballo Jan 12 '13 at 23:28
  • @K-ballo - Re *This is covered by [5.8.3]/5* You meant [8.5.3]/5. Section 5.8.3 is about shift operators, 8.5.3, about references. If it helps any, C++11 is even more verbose than C++03 is. Unfortunately, that extra verbosity comes at the expense of decreased comprehensibility. – David Hammen Jan 12 '13 at 23:59
  • So I'm dying to know. Declare another `print` of the form `void print(const A* p) { p->print(); }` and invoke *it* from `void print(const A& a) { print(&a); }`. I'm particularly interested in the compilers that exhibit "a" in the original question. Mine doesn't (clang, of course). I get "b". – WhozCraig Jan 12 '13 at 23:59
  • @DavidHammen: Oh right, its _[8.5.3]/5_, too late to edit my old comment now :( – K-ballo Jan 13 '13 at 00:03
  • @WhozCraig it prints "a" again (VC++) – user673679 Jan 13 '13 at 00:38
  • @K-ballo, if slicing were involved it would be a double conversion which isn't allowed for automatic conversions. – Mark Ransom Jan 13 '13 at 01:10
  • @MarkRansom: Double conversions are allowed, its user defined conversions which are not allowed to take place more than once. The second conversion would be a standard conversion per _[8.5.3]/5_ – K-ballo Jan 13 '13 at 01:13
  • @DavidRodríguez-dribeas: what is the difference between B operator(); vs operator B(); ? – Destructor Sep 07 '15 at 15:09
  • @jj99: excellent question. I like it. – Destructor Sep 07 '15 at 15:10

1 Answers1

10

The problem here is a bug / misfeature / hole in the C++03 standard, with different compilers trying to patch over the problem in different ways. (This problem no longer exists in C++11 standard.)

Sections 8.5.3/5 of both standards specify how a reference is initialized. Here's the C++03 version (the list numbering is mine):

A reference to type cv1 T1 is initialized by an expression of type cv2 T2 as follows:

  1. If the initializer expression

    1. is an lvalue (but is not a bit-field), and “cv1 T1” is reference-compatible with “cv2 T2,” or
    2. has a class type (i.e., T2 is a class type) and can be implicitly converted to an lvalue of type cv3 T3, where cv1 T1 is reference-compatible with cv3 T3

    then the reference is bound directly to the initializer expression lvalue in the first case, and the reference is bound to the lvalue result of the conversion in the second case.

  2. Otherwise, the reference shall be to a non-volatile const type (i.e., cv1 shall be const).

  3. If the initializer expression is an rvalue, with T2 a class type, and cv1 T1 is reference-compatible with cv2 T2, the reference is bound in one of the following ways (the choice is implementation-defined):

    1. The reference is bound to the object represented by the rvalue (see 3.10) or to a sub-object within that object.
    2. A temporary of type cv1 T2 [sic] is created, and a constructor is called to copy the entire rvalue object into the temporary. The reference is bound to the temporary or to a sub-object within the temporary.

    The constructor that would be used to make the copy shall be callable whether or not the copy is actually done.

  4. Otherwise, a temporary of type cv1 T1 is created and initialized from the initializer expression using the rules for a non-reference copy initialization (8.5). The reference is then bound to the temporary.

There are three types involved in the question at hand:

  • The type of the reference to be created. The standards (both versions) denote this type as T1. In this case, it is struct A.
  • The type of the initializer expression. The standards denote this type as T2. In this case, the initializer expression is the variable c, so T2 is struct C. Note that because struct A is not reference-compatible with struct C, it's not possible to directly bind the reference to c. An intermediate is needed.
  • The type of the intermediate. The standards denote this type as T3. In this case, this is struct B. Note that applying the conversion operator C::operator B() to c will convert the lvalue c to an rvalue.

The initializations by what I labeled as 1.1 and 3 are out because the struct A is not reference-compatible with struct C. The conversion operator C::operator B() needs to be used. 1.2 is out Because this conversion operator returns an rvalue, this rules 1.2 out. All that is left is option 4, create a temporary of type cv1 T1. Strict compliance with the 2003 version of the standard forces the creation of two temporaries for this problem, even though only one will suffice.

The 2011 version of the standard fixes the problem by replacing option 3 with

  • If the initializer expression

    • is an xvalue, class prvalue, array prvalue or function lvalue and cv1 T1 is reference- compatible with cv2 T2, or
    • has a class type (i.e., T2 is a class type), where T1 is not reference-related to T2, and can be implicitly converted to an xvalue, class prvalue, or function lvalue of type cv3 T3, where cv1 T1 is reference-compatible with cv3 T3,

    then the reference is bound to the value of the initializer expression in the first case and to the result of the conversion in the second case (or, in either case, to an appropriate base class subobject). In the second case, if the reference is an rvalue reference and the second standard con- version sequence of the user-defined conversion sequence includes an lvalue-to-rvalue conversion, the program is ill-formed.

It appears that the gcc family of compilers chose strict compliance over intent (avoid creating unnecessary temporaries), while the other compilers that print "b" chose intent / corrections to the standard. Choosing strict compliance isn't necessarily commendable; there are other bugs/misfeatures in the 2003 version of the standard (e.g., std::set) where the gcc family chose sanity over strict compliance.

David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • +1: However, I could not understand what you meant by `(avoid creating unnecessary temporaries)`. It seems to me that the `or, in either case, to an appropriate base class subobject` parts allows gcc to print "a" and not "b". – Jesse Good Jan 13 '13 at 22:45
  • @JesseGood - A gcc-compiled version of the program creates two temporaries, one of type `B` with the call to `C::operator B()` and another of type `A` by copy construction. That second temporary is completely unnecessary. Because creating unnecessary temporaries can kill an application performance-wise, avoiding the creation of them is important. For example, Eigen uses expression templates for lazy evaluation to avoid unnecessary temporaries. The standard committee has been hunting down places where the standard unnecessarily mandates unnecessary temporaries for quite some time. – David Hammen Jan 14 '13 at 08:50
  • I see now. gcc still follows C++03 rules. I found a related [defect report](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1287) and if the proposed resolution goes through, then gcc would be correct once again (although it is currently under review and it seems not everyone agrees on what to do). – Jesse Good Jan 14 '13 at 22:10