3

I was debugging a program when I came across the following code I had erroneously typed similar to the following:

//Original (wrong)
std::string first("Hello");
std::string second = first + second;

//Instead of this (correct)
std::string first("Hello");
std::string second = first + something_else;

Obviously I wasn't trying to do this (I can't think why anyone would want to do this), but it got me thinking. It doesn't look like the original should work, and I would assume it is undefined. Indeed, this was the source of my problem.

To make the problem more general, consider the following:

SomeType a;
SomeType b = a + b;

Is the behavior undefined simply because b is not yet initialized (see this answer)?

If the behavior is undefined, then my real question is, why?

Is this only undefined for certain standard containers, like std::string, or is this undefined in a more general sense (STL classes, user-defined classes, PODs, fundamental types)?

What part of the standard applies to this?

Assume this is c++11, if necessary.

Community
  • 1
  • 1
Kaiged
  • 609
  • 1
  • 7
  • 19
  • possible duplicate of [Construct object with itself as reference?](http://stackoverflow.com/questions/4368361/construct-object-with-itself-as-reference) – Bo Persson Apr 28 '12 at 00:43
  • @Bo Persson - I feel that is a very related question (and useful in helping to answer this), but not quite a duplicate. It is asking why this is syntactically allowed, I'm asking in what cases it is UB. Thanks for the reference, though. It helped me understand it in a different way. – Kaiged May 01 '12 at 15:04

3 Answers3

6

The C++11 standard has this to say about the scope of a newly declared name:

3.3.2 Point of declaration [basic.scope.pdecl]

The point of declaration for a name is immediately after its complete declarator (Clause 8) and before its initializer (if any), except as noted below. [ Example:

int x = 12;
{ int x = x; }

Here the second x is initialized with its own (indeterminate) value. — end example ]

There is similar wording in prior C++ standards.

Off the top of my head, one rationale I can think of is that the name could be used in an initializer expression that takes the address of the object.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Another rationale is that you can use the name for type deduction, e.g. `FunkyComponent c = objectManager.get_component(objectID, c);` - although I'm now wondering whether that's actually defined or not. – Stuart Golodetz Apr 28 '12 at 00:31
  • @StuartGolodetz: It would be much better to pass the address, if type deduction is the goal. But if you just want to avoid typing the typename twice, then `auto c = objectManager.get_component(objectID);` would be better. – Ben Voigt Apr 28 '12 at 00:34
  • @BenVoigt: Yes, very true about `auto` in C++11 :) This was a trick I found myself using a while back (specifically here: https://github.com/sgolodetz/hesperus2/blob/master/source/engine/core/hesp/objects/base/ObjectManager.tpp) - I guess it's no longer appropriate. That said, why would it be better to pass the address out of interest? – Stuart Golodetz Apr 28 '12 at 01:28
  • @StuartGolodetz: So you don't make a copy of an uninitialized object. – Ben Voigt Apr 28 '12 at 03:25
  • @BenVoigt: Ah ok I see what you mean. I was using a const reference parameter when I did this, which also avoids the copy. – Stuart Golodetz Apr 28 '12 at 08:02
1

Reading an uninitialized variable can lead to undefined behavior.

The standard says this:

Initializers [dcl.init]

.......

If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • It does seem self-evident to me in the case of fundamental types, but perhaps I don't understand when it comes to classes, constructors, etc. At what point is the constructor for `b` called? Is it called when attempting to use the `+` operator, or after? Or is it just irrelevant because it is _undefined_? – Kaiged Apr 28 '12 at 00:13
  • 2
    The initialization occurs after `first + second` is evaluated and therefore `second` is not initialized. – David Heffernan Apr 28 '12 at 00:19
  • Then with that, it _is_ evident. – Kaiged Apr 28 '12 at 00:24
  • *indeterminate value* isn't the same as UB, though. – Ben Voigt Apr 28 '12 at 00:27
  • @BenVoigt I thought that reading an indeterminate value results in UB. If it doesn't, what is the correct terminology. – David Heffernan Apr 28 '12 at 00:29
  • @David: *indeterminate value* is the correct term. The Standard also says that "operations involving indeterminate values may cause undefined behavior", but this is definitely not true for all types (as your answer claimed), for example fixed-width integral types are specified as two's-complement encoding and all possible values are legal. – Ben Voigt Apr 28 '12 at 00:33
  • @David: Reading an uninitialized `uint16_t` is guaranteed not to crash your program. `rand()` returns an indeterminate value, after all. – Ben Voigt Apr 28 '12 at 00:36
  • @Ben Surely there must be terminology for reading an uninitialized `int`. – David Heffernan Apr 28 '12 at 00:37
  • @David: "unspecified behavior" (1.3.25) – Ben Voigt Apr 28 '12 at 00:39
  • @Ben: "Reading an uninitialized uint16_t is guaranteed not to crash your program" - so a debugging implementation that uses some sort of clever tracking of which memory locations have been initialized, and barfs when you read uninitialized memory as `uint16_t`, is non-conforming? But the standard doesn't actually say what "indeterminate" means. IMO this is a defect, but evidently not in the opinion of the committee since it's unchanged in C++11. – Steve Jessop Apr 28 '12 at 01:02
  • Btw, I don't think `rand` returns an indeterminate value, I think it returns a sequence of values consistent from a given seed (a bit like implementation-defined), but needn't be documented (so not formally implementation-defined). Supposing a hypothetical implementation on which `int` has trap representations, an object of indeterminate value by reason of being uninitialized may hold a trap representation, whereas `rand` must not return a trap representation. So whatever "indeterminate value" means, the return value of `rand` isn't it :-) – Steve Jessop Apr 28 '12 at 01:07
  • @Steve: I mentioned the fixed-width integral types specifically because they can't have trap representations. – Ben Voigt Apr 28 '12 at 03:24
  • @Ben: my point is that you state that the English phrase "indeterminate value", which is not defined in the standard, means in effect the same as the more standard-like phrase "unspecified bytes in the object representation" would mean. You seem to imply that the standard guarantees the object otherwise behaves as if it has some particular object representation, and all that is open to doubt is which one. I'm not certain that's the case. For example, after `uint16_t a;` I don't see anything in the standard to guarantee that `a + a` is even or that `a == a`, as opposed to both being UB. Do you? – Steve Jessop Apr 28 '12 at 12:35
  • For a realistic example, consider `uint16_t a; uint16_t c = a; uint16_t b = 1; uint16_t d = a; std::cout << b; a = 2; std::cout << a << c << d;`. I would say that fact `a` has "indeterminate value" when used to initialize `c` means the whole thing is UB. In particular an implementation could reasonably use the same CPU register for `b` and `a`, and hence you could in fact observe `c != d`. I'm interested whether you agree, and if not what in the standard forbids an indeterminate value from changing over time in a way that could interact badly with optimizations and perhaps even crash. – Steve Jessop Apr 28 '12 at 12:41
  • @Steve: So you think the value of a variable is allowed to change arbitrarily? We have two reads from the same variable without an intervening write, surely both reads return the same value. There's no race condition here, no conflicting operations that would make the behavior undefined. It's like Schroedinger's Cat... you don't know whether it's alive or dead until you check. But if you check a second time, you observe the same state as the first. By what rule does the standard allow your proposed optimization of overlapping `a` and `b`? – Ben Voigt Apr 28 '12 at 19:04
  • @Ben: In situations where UB has been triggered, yes of course the value of a variable can change arbitrarily. 4.1 says that an lvalue-rvalue conversion on an uninitialized object has UB (no mention that it's only if it has a trap value), and that's the closest the standard comes to saying what "indeterminate value" actually means. C99 does define "indetermiante value", to mean either an unspecified value or a trap representation, so I think you could probably argue your case in C. In C++, because of 4.1 uninitialized objects don't seem to have a value, "indeterminate" or otherwise. – Steve Jessop Apr 29 '12 at 20:08
0

The why is simple: Because the syntax is sugar. What looks like simple assignment is, infact, copy construction; the right hand of the expression is evaluated and passed to the copy constructor of the left hand.

SomeType b = a + b;

is actually

SomeType b(a + b /*wat?*/);

Part of the motivation behind this is RVO. Consider instead the case of

SomeType a, b;
SomeType c = a + b;

c can actually be forwarded as the temp object that a.operator+(b) uses to construct the return value.

SomeType SomeType::operator+(const SomeType& rhs) const
{
    SomeType temp(*this); // RVO will employ `c` here instead of a 4th object.
    ...
    return temp; // yeah, let's not and say we did.
}

Note that you can take your own address:

inptr_t i = (intptr_t)&i;
void* ptr = &ptr;

http://ideone.com/GUJyio

kfsone
  • 23,617
  • 2
  • 42
  • 74