4

Can someone with a deeper understanding of the C++ standard than me please elaborate on this?

This is my example program

#include <string>
#include <iostream>

int main(int argc, char* argv[]) {
    const std::string message("hello world");
    std::cout << std::hex << (void*)message.c_str() << std::endl;
    const std::string& toPrint = (argc > 0) ? message : "";
    std::cout << std::hex << (void*)toPrint.c_str() << std::endl;
    return 0;
}

On one machine it does this:

# g++ --version && g++ str_test.cpp && ./a.out                  
g++ (Debian 4.7.2-5) 4.7.2
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

0x9851014
0x9851014

message and toPrint seem to refer to the same instance as I would expect. However, on another machine, this happens:

# g++ --version && g++ str_test.cpp && ./a.out 
g++ (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

0x7ffeb9ab4ac0
0x7ffeb9ab4ae0

Here it looks like the compiler constructed a copy of message for toPrint to point at.

What behavior is correct according to the C++ standard? Or is it undefined in general?

MBober
  • 1,095
  • 9
  • 25
  • Could you please reduce your code by removing (? :) operator and check again? – Sergei Nikulov Jan 14 '16 at 09:22
  • 2
    @SergeiNikulov I suspect that the conditional operator is essential to the observed behaviour. However, I wouldn't mind if MBober printed `argc` for demonstrating that it infact is always true. – eerorika Jan 14 '16 at 09:24
  • @SergeiNikulov user2079303 That's right. This is just a simple example. I need to know the behavior of the ? operator and needed a condition that is always true but cannot be optimized out. But FYI, when I make this a simple assignment, the code behaves on GCC 5 the same as GCC 4. – MBober Jan 14 '16 at 09:28
  • @VioletGiraffe argc is always at least 1. – MBober Jan 14 '16 at 09:29

2 Answers2

6

You are being confused by GLIBCs copy-on-write string sharing. Change your test program to:

#include <string>
#include <iostream>

int main(int argc, char* argv[]) {
    const std::string message("hello world");
    std::cout << std::hex << (void*)&message << std::endl;
    const std::string& toPrint = (argc > 0) ? message : "";
    std::cout << std::hex << (void*)&toPrint << std::endl;
    return 0;
}

(in other words print the address of the string object, not the address of the contained text), and both platforms will return different addresses.

The latest standard has forbidden copy-on-write (although I don't understand how exactly). Prior to that it was legal, but not mandatory. (Current thinking is that 'small string optimization' does better than cow - particularly in a multithreaded world).

4

Martin Bonner explains why the address could be the same even for a copy of the string.

To explain, why the message and toPrint seem to refer to the same instance as I would expect. is misguided, I shall quote the standard.

Let's first explore what conversion is needed (I suppose it's not the question here, but just for completeness). Ignore the first otherwise. It refers to the case of void type expressions.

[expr.cond]/3 Otherwise, if the second and third operand have different types and either has (possibly cv-qualified) class type, or if both are glvalues of the same value category and the same type except for cv-qualification, an attempt is made to convert each of those operands to the type of the other. The process for determining whether an operand expression E1 of type T1 can be converted to match an operand expression E2 of type T2 is defined as follows:

  • If E2 is an lvalue: E1 can be converted to match E2 if E1 can be implicitly converted to the type "lvalue reference to T2", subject to the constraint that in the conversion the reference must bind directly to an lvalue. (cannot bind lvalue reference of type std::string to a strig literal)
  • If E2 is an xvalue: E1 can be converted to match E2 if E1 can be implicitly converted to the type "rvalue reference to T2", subject to the constraint that the reference must bind directly. (no xvalues here)
  • If E2 is an rvalue or if neither of the conversions above can be done and at least one of the operands has (possibly cv-qualified) class type:
    • if E1 and E2 have class type, and the underlying class types are the same or one is a base class of the other: E1 can be converted to match E2 if the class of T2 is the same type as, or a base class of, the class of T1, and the cv-qualification of T2 is the same cv-qualification as, or a greater cv-qualification than, the cv-qualification of T1. If the conversion is applied, E1 is changed to a prvalue of type T2 by copy-initializing a temporary of type T2 from E1 and using that temporary as the converted operand. (string literal has no class type)
    • Otherwise (i.e., if E1 or E2 has a nonclass type, or if they both have class types but the underlying classes are not either the same or one a base class of the other): E1 can be converted to match E2 if E1 can be implicitly converted to the type that expression E2 would have if E2 were converted to a prvalue (or the type it has, if E2 is a prvalue). (this applies)

The final bullet covers this case. The string literal has a nonclass type and it can be converted to match a std::string prvalue.

Now, let's explore how the conversion affects the result.

4 If the second and third operands are glvalues of the same value category and have the same type (they are not), the result is of that type and value category and it is a bit-field if the second or the third operand is a bit-field, or if both are bit-fields.

5 Otherwise, the result is a prvalue. If the second and third operands do not have the same type, and either has (possibly cv-qualified) class type, overload resolution is used to determine the conversions (if any) to be applied to the operands (13.3.1.2, 13.6). If the overload resolution fails, the program is ill-formed. Otherwise, the conversions thus determined are applied, and the converted operands are used in place of the original operands for the remainder of this section.

So, the result is a prvalue! It's not a lvalue reference. How do you get a prvalue from an lvalue?

6 Lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are performed on the second and third operands. After those conversions, one of the following shall hold:

  • The second and third operands have the same type; the result is of that type. If the operands have class type (they do after the conversion), the result is a prvalue temporary of the result type, which is copy-initialized from either the second operand or the third operand depending on the value of the first operand.

So, we know that the result will be copy-initialized from the operand expression. Even though, we assign a reference, and the operand of the conditional is an lvalue reference to the same type, the reference will be bound to a temporary, copied from the operand.

If you had used another lvalue reference to const std::string as the third operand, then you would have simply assigned the to the lvalue, rather than to a temporary prvalue.

eerorika
  • 232,697
  • 12
  • 197
  • 326