Construct object with itself as reference?

Question

I just realised that this program compiles and runs (gcc version 4.4.5 / Ubuntu):

#include <iostream>
using namespace std;

class Test
{
public:
  // copyconstructor
  Test(const Test& other);
};

Test::Test(const Test& other)
{
  if (this == &other)
    cout << "copying myself" << endl;
  else
    cout << "copying something else" << endl;
}

int main(int argv, char** argc)
{
  Test a(a);              // compiles, runs and prints "copying myself"
  Test *b = new Test(*b); // compiles, runs and prints "copying something else"
}

I wonder why on earth this even compiles. I assume that (just as in Java) arguments are evaluated before the method / constructor is called, so I suspect that this case must be covered by some "special case" in the language specification?

Questions:

Could someone explain this (preferably by referring to the specification)?
What is the rationale for allowing this?
Is it standard C++ or is it gcc-specific?

EDIT 1: I just realised that I can even write int i = i;

EDIT 2: Even with -Wall and -pedantic the compiler doesn't complain about Test a(a);.

EDIT 3: If I add a method

Test method(Test& t)
{
  cout << "in some" << endl;
  return t;
}

I can even do Test a(method(a)); without any warnings.

Related: http://stackoverflow.com/questions/3892098/ctor-initializer-self-initialization-causes-crash — wkl, Dec 06 '10 at 16:22
On calling a method with an object of "doubtful integrity", that's very much allowed. You can do `obj *ptr = NULL; ptr->someFunc();` and, as long as `obj::someFunc()` doesn't access member variables, it'll run happily. That's even used in some contexts, making objects NULL-resistant by encapsulating all public member functions in `if (!this) return /* some error */ ;` — FrankH., Dec 06 '10 at 18:27
@FrankH: I don't think that's clear. See http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232, and in particular the note at the end, "there are other contexts in which lvalues can occur, such as the left operand of . or .*, which should also be restricted". Nowhere does the standard lay out the conditions you describe, for example AFAIK it draws no distinction between calling a virtual or non-virtual member function on a null pointer. So I think you must rely on implementation-specific information to do as you describe, and I would be wary the optimizer might assume `this != 0`. — Steve Jessop, Dec 06 '10 at 21:11
I guess in summary, there's a difference between "runs happily" (i.e. "works on my machine"), and "is very much allowed" (i.e. "is guaranteed by the standard"). — Steve Jessop, Dec 06 '10 at 21:12

score 8 · Accepted Answer · answered Dec 06 '10 at 16:46

The reason this "is allowed" is because the rules say an identifiers scope starts immediately after the identifier. In the case

int i = i;

the RHS i is "after" the LHS i so i is in scope. This is not always bad:

void *p = (void*)&p; // p contains its own address

because a variable can be addressed without its value being used. In the case of the OP's copy constructor no error can be given easily, since binding a reference to a variable does not require the variable to be initialised: it is equivalent to taking the address of a variable. A legitimate constructor could be:

struct List { List *next; List(List &n) { next = &n; } };

where you see the argument is merely addressed, its value isn't used. In this case a self-reference could actually make sense: the tail of a list is given by a self-reference. Indeed, if you change the type of "next" to a reference, there's little choice since you can't easily use NULL as you might for a pointer.

As usual, the question is backwards. The question is not why an initialisation of a variable can refer to itself, the question is why it can't refer forward. [In Felix, this is possible]. In particular, for types as opposed to variables, the lack of ability to forward reference is extremely broken, since it prevents recursive types being defined other than by using incomplete types, which is enough in C, but not in C++ due to the existence of templates.

score 3 · Answer 2 · answered Dec 06 '10 at 16:15

3

I have no idea how this relates to the specification, but this is how I see it:

When you do Test a(a); it allocates space for a on the stack. Therefore the location of a in memory is known to the compiler at the start of main. When the constructor is called (the memory is of course allocated before that), the correct this pointer is passed to it because it's known.

When you do Test *b = new Test(*b);, you need to think of it as two steps. First the object is allocated and constructed, and then the pointer to it is assigned to b. The reason you get the message you get is that you're essentially passing in an uninitialized pointer to the constructor, and the comparing it with the actual this pointer of the object (which will eventually get assigned to b, but not before the constructor exits).

answered Dec 06 '10 at 16:15

Matti Virkkunen

63,558
9
127
159

Gee, what was the down-vote for? Would you mind pointing out what I was wrong about? – Matti Virkkunen Dec 06 '10 at 16:20
Ok. Thanks. You made it clear to me why it is possible to compile this. You know that the rationale is for allowing this type of program? – aioobe Dec 06 '10 at 16:28
@aioobe: As I said, I can't say anything about the specification as I haven't read it, but `Test a(a)` makes sense in a, weird, perverted way. It's just passing in a pointer to `a`, which has been allocated at that point (but not constructed). And as far as I can see, the pointer can only be passed to the constructor itself, which does no damage as the constructor already has the pointer anyways. As for the second example, it's just a classic case of using an uninitialized pointer. – Matti Virkkunen Dec 06 '10 at 16:30
Correct me if I'm wrong, but don't you mean "reference" where you've written "pointer"? – aioobe Dec 06 '10 at 16:32
On further thought, you could likely pass in a pointer to a pointer to an unconstructed object to something else besides the constructor by doing `Test a(SomethingElse(a));`. I wonder if that compiles, and what the specification says about unconstructed object. – Matti Virkkunen Dec 06 '10 at 16:32
@aioobe: A reference is essentially a pointer. I'm more of a C programmer so I might refer to them using the wrong terminology at times. Please try to understand. – Matti Virkkunen Dec 06 '10 at 16:34
Ah. Interesting point. Sure, it compiles and runs without warnings (even with `-Wall` and `-pedantic`: http://ideone.com/XDQLc ) – aioobe Dec 06 '10 at 16:37

score 3 · Answer 3 · answered Dec 06 '10 at 16:39

The second one where you use new is actually easier to understand; what you're invoking there is exactly the same as:

Test *b;
b = new Test(*b);

and you're actually performing an invalid dereference. Try to add a << &other << to your cout lines in the constructor, and make that

Test *b = (Test *)0xFOOD1E44BADD1E5;

to see that you're passing through whatever value a pointer on the stack has been given. If not explicitly initialized, that's undefined. But even if you don't initialize it with some sort of (in)sane default, it'll be different from the return value of new, as you found out.

For the first, think of it as an in-place new. Test a is a local variable not a pointer, it lives on the stack and therefore its memory location is always well defined - this is very much unlike a pointer, Test *b which, unless explicitly initialized to some valid location, will be dangling.

If you write your first instantiation like:

Test a(*(&a));

it becomes clearer what you're invoking there.

I don't know a way to make the compiler disallow (or even warn) about this sort of self-initialization-from-nowhere through the copy constructor.

Ok. Thanks. You made it clear to me why it is possible to compile this. You know what the rationale is for allowing this type of program? — aioobe, Dec 06 '10 at 16:41
@aioobe: See Yttril's answer below, it gives a usecase. The contents of `other` might make no sense in such a situation but its address might well be meaningful. Another such situation would be to initialize one of your objects through `Test C(*(Test*)NULL);` - whether that makes sense or not depends on your use case. — FrankH., Dec 06 '10 at 17:02

Steve Jessop · Answer 4 · 2010-12-06T21:15:10.610

The first case is (perhaps) covered by 3.8/6:

before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any lvalue which refers to the original object may be used but only in limited ways. Such an lvalue refers to allocated storage (3.7.3.2), and using the properties of the lvalue which do not depend on its value is well-defined.

Since all you're using of a (and other, which is bound to a) before the start of its lifetime is the address, I think you're good: read the rest of that paragraph for the detailed rules.

Beware though that 8.3.2/4 says, "A reference shall be initialized to refer to a valid object or function." There is some question (as a defect report on the standard) what "valid" means in this context, so possibly you can't bind the parameter other to the unconstructed (and hence, "invalid"?) a.

So, I'm uncertain what the standard actually says here - I can use an lvalue, but not bind it to a reference, perhaps, in which case a isn't good, while passing a pointer to a would be OK as long as it's only used in the ways permitted by 3.8/5.

In the case of b, you're using the value before it's initialized (because you dereference it, and also because even if you got that far, &other would be the value of b). This clearly is not good.

As ever in C++, it compiles because it's not a breach of language constraints, and the standard doesn't explicitly require a diagnostic. Imagine the contortions the spec would have to go through in order to mandate a diagnostic when an object is invalidly used in its own initialization, and imagine the data flow analysis that a compiler might have to do to identify complex cases (it may not even be possible at compile time, if the pointer is smuggled through an externally-defined function). Easier to leave it as undefined behavior, unless anyone has any really good suggestions for new spec language ;-)

score 2 · Answer 5 · answered Dec 06 '10 at 16:13

2

If you crank your warning levels up, your compiler will probably warn you about using uninitialized stuff. UB doesn't require a diagnostic, many things that are "obviously" wrong may compile.

answered Dec 06 '10 at 16:13

etarion

16,935
4
43
66

$ g++ -Wall -pedantic -o tst tst.cpp tst.cpp: In function ‘int main(int, char**)’: tst.cpp:21: warning: ‘i’ is used uninitialized in this function tst.cpp:23: warning: ‘b’ is used uninitialized in this function – FrankH. Dec 06 '10 at 16:18
http://ideone.com/NaJVP compiles without warnings when I compile with `-Wall` and `-pedantic` – aioobe Dec 06 '10 at 16:18
(was just to confirm etarion, and I had added your `int i = i;` line) – FrankH. Dec 06 '10 at 16:18
@FrankH, right, so even with `-Wall -pedantic` the `Test a(a);` goes through without warnings. – aioobe Dec 06 '10 at 16:22
aioobe, yes, see my answer below. I can explain this behaviour but not solve it for the `a(a)` case. – FrankH. Dec 06 '10 at 16:42

score 0 · Answer 6 · answered Dec 06 '10 at 16:16

0

I don't know the spec reference, but I do know that accessing an uninitialized pointer always results in undefined behaviour.

When I compile your code in Visual C++ I get:

test.cpp(20): warning C4700: uninitialized local variable 'b' used

answered Dec 06 '10 at 16:16

Steve Townsend

53,498
9
91
140

That line won't even compile in Visual C++ : `test.cpp(19): error C2065: 'a' : undeclared identifier`. – Steve Townsend Dec 06 '10 at 16:26
Aha, then it's even more interesting to know what the language specification says about it. Either it must be compiler dependent, or one of Visual C++ or g++ does not behave according to specification. – aioobe Dec 06 '10 at 16:29

Construct object with itself as reference?

6 Answers6

Linked