Why does a recursive constructor call make invalid C# code compile?

Question

After watching webinar Jon Skeet Inspects ReSharper, I've started to play a little with recursive constructor calls and found, that the following code is valid C# code (by valid I mean it compiles).

class Foo
{
    int a = null;
    int b = AppDomain.CurrentDomain;
    int c = "string to int";
    int d = NonExistingMethod();
    int e = Invalid<Method>Name<<Indeeed();

    Foo()       :this(0)  { }
    Foo(int v)  :this()   { }
}

As we all probably know, field initialization is moved into constructor by the compiler. So if you have a field like int a = 42;, you will have a = 42 in all constructors. But if you have constructor calling another constructor, you will have initialization code only in called one.

For example if you have constructor with parameters calling default constructor, you will have assignment a = 42 only in the default constructor.

To illustrate second case, next code:

class Foo
{
    int a = 42;

    Foo() :this(60)  { }
    Foo(int v)       { }
}

Compiles into:

internal class Foo
{
    private int a;

    private Foo()
    {
        this.ctor(60);
    }

    private Foo(int v)
    {
        this.a = 42;
        base.ctor();
    }
}

So the main issue, is that my code, given at the start of this question, is compiled into:

internal class Foo
{
    private int a;
    private int b;
    private int c;
    private int d;
    private int e;

    private Foo()
    {
        this.ctor(0);
    }

    private Foo(int v)
    {
        this.ctor();
    }
}

As you can see, the compiler can't decide where to put field initialization and, as result, doesn't put it anywhere. Also note, there are no base constructor calls. Of course, no objects can be created, and you will always end up with StackOverflowException if you will try to create an instance of Foo.

I have two questions:

Why does compiler allow recursive constructor calls at all?

Why we observe such behavior of the compiler for fields, initialized within such class?

Some notes: ReSharper warns you with Possible cyclic constructor calls. Moreover, in Java such constructor calls won't event compile, so the Java compiler is more restrictive in this scenario (Jon mentioned this information at the webinar).

This makes these questions more interesting, because with all respect to Java community, the C# compiler is at least more modern.

This was compiled using C# 4.0 and C# 5.0 compilers and decompiled using dotPeek.

Nice field initializers there: `int a = null; int b = AppDomain.CurrentDomain; int c = "string to int"; int d = NonExistingMethod(); int e = InvalidName< — Jeppe Stig Nielsen, May 20 '13 at 09:37
I believe this is allowed [due to the same reason](http://stackoverflow.com/a/8762743/11683). — GSerg, May 20 '13 at 10:00
@GSerg, +1. It also even makes this one look like a duplicate question :) — Alex Filipovici, May 20 '13 at 10:03
@IlyaIvanov The fact that initialization code disappears completely does *look* like a bug, but I believe it would only be an actual bug if the resulting code had a chance to actually run and let the program to proceed. Otherwise, it's a weird optimization that allows you to fail quicker. — GSerg, May 20 '13 at 10:08
I can actually create an instance of the `Foo` class (the one in the top of your question), with `var fooObj = (Foo)System.Runtime.Serialization.FormatterServices.GetUninitializedObject(typeof(Foo));`. Of course the instance fields are then `0` of type `int`. — Jeppe Stig Nielsen, May 20 '13 at 10:17
I think this is just the principle of 'features don't exist until someone codes them'. No C# compiler dev coded in the ability to detect recursive compiler calls so they aren't detected. After all, even if you did do it by mistake, you'd discover it immediately and it would be obvious. — Patashu, May 20 '13 at 10:35
The field initialisation gets put into all constructors that call a base constructor. As a consequence, if there is no constructor that calls a base constructor, the field initialisation does not get put anywhere. At least that part makes perfect sense to me. It's not that the compiler cannot figure out where to put it, it's because the compiler notices it *doesn't* have to put it anywhere. — , May 20 '13 at 11:08
_vbc_ has +1 against _csc_ on this issue: [Constructor '' cannot call itself](http://msdn.microsoft.com/en-us/library/sf8ef24e(v=vs.110).aspx) (Error ID: BC30298). — Alex Filipovici, May 21 '13 at 12:40
One could modify the program so as not to bomb the stack with something like e.g. having `public Foo() : this(1/"".Length);` I don't think one could leak a reference to the object under construction, but it's interesting to note that this trick would allow one to define a *type* which inherits an external type which exposes no constructors outside its assembly, and seemingly abides a `new()` constraint. — supercat, May 22 '13 at 21:21
It does not compile after Feb 25, 2015. The error code cs0768 is added for it. https://github.com/dotnet/roslyn/blob/dfce3ae13a509cd7d006b3eed064a1709de722db/src/Compilers/CSharp/Portable/Errors/ErrorCode.cs — Deepak Mishra, Dec 21 '19 at 19:14

score 11 · Accepted Answer · answered May 20 '13 at 19:49

Interesting find.

It appears that there are really only two kinds of instance constructors:

An instance constructor which chains another instance constructor of the same type, with the : this( ...) syntax.
An instance constructor which chains an instance constructor of the base class. This includes instance constructors where no chainig is specified, since : base() is the default.

(I disregarded the instance constructor of System.Object which is a special case. System.Object has no base class! But System.Object has no fields either.)

The instance field initializers that might be present in the class, need to be copied into the beginning of the body of all instance constructors of type 2. above, whereas no instance constructors of type 1. need the field assignment code.

So apparently there's no need for the C# compiler to do an analysis of the constructors of type 1. to see if there are cycles or not.

Now your example gives a situation where all instance constructors are of type 1.. In that situation the field initaializer code does not need to be put anywhere. So it is not analyzed very deeply, it seems.

It turns out that when all instance constructors are of type 1., you can even derive from a base class that has no accessible constructor. The base class must be non-sealed, though. For example if you write a class with only private instance constructors, people can still derive from your class if they make all instance constructors in the derived class be of type 1. above. However, an new object creation expression will never finish, of course. To create instances of the derived class, one would have to "cheat" and use stuff like the System.Runtime.Serialization.FormatterServices.GetUninitializedObject method.

Another example: The System.Globalization.TextInfo class has only an internal instance constructor. But you can still derive from this class in an assembly other than mscorlib.dll with this technique.

Finally, regarding the

Invalid<Method>Name<<Indeeed()

syntax. According to the C# rules, this is to be read as

(Invalid < Method) > (Name << Indeeed())

because the left-shift operator << has higher precedence than both the less-than operator < and the greater-than operator >. The latter two operarors have the same precedence, and are therefore evaluated by the left-associative rule. If the types were

MySpecialType Invalid;
int Method;
int Name;
int Indeed() { ... }

and if the MySpecialType introduced an (MySpecialType, int) overload of the operator <, then the expression

Invalid < Method > Name << Indeeed()

would be legal and meaningful.

In my opinion, it would be better if the compiler issued a warning in this scenario. For example, it could say unreachable code detected and point to the line and column number of the field initializer that is never translated into IL.

I don't understand ... doesnt field instantiation is invoked before ctor ? — Royi Namir, May 21 '13 at 10:17
@RoyiNamir Yes. But if you look at the IL, it works like the asker writes: _"As we all probably know, field initialization is moved into constructor by the compiler."_ What is meant by that is, suppose you write this class in C#: `class Example { int field = 42; internal Example() { /* some code here */ field = 100; } }`, then the IL produced by that puts the `42` assignment into the instance constructor, before everything else, exactly as if you had written: `class Example { int field; internal Example() { field = 42; /* some code here */ field = 100; } }` — Jeppe Stig Nielsen, May 21 '13 at 10:57

Damien_The_Unbeliever · Answer 2 · 2013-05-20T10:29:34.653

I think because the language specification only rules out directly invoking the same constructor that is being defined.

From 10.11.1:

All instance constructors (except those for class object) implicitly include an invocation of another instance constructor immediately before the constructor-body. The constructor to implicitly invoke is determined by the constructor-initializer

...

An instance constructor initializer of the form this(argument-list_opt) causes an instance constructor from the class itself to be invoked ... If an instance constructor declaration includes a constructor initializer that invokes the constructor itself, a compile-time error occurs

That last sentence seems to only preclude direct calling itself as producing a compile time error, e.g.

Foo() : this() {}

is illegal.

I admit though - I can't see a specific reason for allowing it. Of course, at the IL level such constructs are allowed because different instance constructors could be selected at runtime, I believe - so you could have recursion provided it terminates.

I think the other reason it doesn't flag or warn on this is because it has no need to detect this situation. Imagine chasing through hundreds of different constructors, just to see if a cycle does exist - when any attempted usage will quickly (as we know) blow up at runtime, for a fairly edge case.

When it's doing code generation for each constructor, all it considers is constructor-initializer, the field initializers, and the body of the constructor - it doesn't consider any other code:

If constructor-initializer is an instance constructor for the class itself, it doesn't emit the field initializers - it emits the constructor-initializer call and then the body.
If constructor-initializer is an instance constructor for the direct base class, it emits the field initializers, then the constructor-initializer call, and then then body.

In neither case does it need to go looking elsewhere - so it's not a case of it being "unable" to decide where to place the field initializers - it's just following some simple rules that only consider the current constructor.

But what about the fact that it lets lines like this compile: `int e = InvalidName< — Matthew Watson, May 20 '13 at 08:47
@MatthewWatson It might be interpreted as `int e = Invalid < Method > Name << Indeed();` with binary operators "less than", "greater than", and "left shift". That's syntactically OK, but it would be some really crazy overloads of the operators to make it OK with strong typing. — Jeppe Stig Nielsen, May 20 '13 at 10:11
@JeppeStigNielsen Aye, but it won't compile if you leave the code the same other than removing the recursive constructor code. That's why I think it's a bug. — Matthew Watson, May 20 '13 at 10:20
@MatthewWatson The error cannot be detected at parse time because the class is incomplete. (Maybe your class will define members called `Invalid` etc that will make it valid.) The error is normally detected at code generation, but you found a way to write code that is never generated. You found a sneaky hole in the compiler (a way to write code that will never be compiled), but not a serious one since the offending code is unreachable anyway. — Raymond Chen, May 20 '13 at 13:58

Stochastically · Answer 3 · 2013-05-20T09:13:08.570

2

Your example

class Foo
{
    int a = 42;

    Foo() :this(60)  { }
    Foo(int v)       { }
}

will work fine, in the sense that you can instantiate that Foo object without problems. However, the following would be more like the code that you're asking about

class Foo
{
    int a = 42;

    Foo() :this(60)     { }
    Foo(int v) : this() { }
}

Both that and your code will create a stackoverflow (!), because the recursion never bottoms out. So your code is ignored because it never gets to execute.

In other words, the compiler can't decide where to put the faulty code because it can tell that the recursion never bottoms out. I think this is because it has to put it where it will only be called once, but the recursive nature of the constructors makes that impossible.

Recursion in the sense of a constructor creating instances of itself within the body of the constructor makes sense to me, because e.g. that could be used to instantiate trees where each node points to other nodes. But recursion via the pre-constructors of the sort illustrated by this question can't ever bottom out, so it would make sense for me if that was disallowed.

edited May 20 '13 at 09:13

answered May 20 '13 at 08:28

Stochastically

7,616
5
30
58

1

Yes, I agree, that's why I've created this question. Why compiler can't decide where to put initialization logic and hence, why does it allow recursive calls at all? Is there a reason for this? – Ilya Ivanov May 20 '13 at 08:30
It seems clear to me that the compiler can't decide where to put the faulty code because it can tell that the recursion never bottoms out. Why is that a mystery? – Stochastically May 20 '13 at 08:33
If C# can't decide what method to call, it throws an error `ambiguous method call`, it doesn't skip such method call. If I would be a compiler, I would throw an error in this scenario too. – Ilya Ivanov May 20 '13 at 08:34
Just edited my answer. The compiler has to put the code where it will only be called once, but that's not possible due to the recursive nature of the code. It's not a question about an ambiguous call, where the compiler can't decide which method to call. – Stochastically May 20 '13 at 08:36
1

it's bad, that answers receive so many downvotes, I'm not downvoting any of them (just is case). In this scenario it also can't decide where to put initialization logic. So my main question is **why** to allow recursive calls at all? Is there a reason behind this? Maybe I'm missing something – Ilya Ivanov May 20 '13 at 08:44
Recursion within the body of the constructor makes sense to me, so I've re-edited my attempt at an answer :-). – Stochastically May 20 '13 at 09:14
3

@IlyaIvanov - I think the more pertinent question is - why write a cycle detector to *detect* recursive constructor calls in the compiler? – Damien_The_Unbeliever May 20 '13 at 09:15

score 0 · Answer 4 · edited May 23 '17 at 11:54

0

I think this is allowed because you can (could) still catch the Exception and do something meaningfull with it.

The initialisation will never be run, and it will almost certaintly throw a StackOverflowException. But this can still be wanted behaviour, and didn't always mean the process should crash.

As explained here https://stackoverflow.com/a/1599236/869482

edited May 23 '17 at 11:54

Community

1
1

answered May 22 '13 at 12:31

Jens Timmerman

9,316
1
42
48

Why does a recursive constructor call make invalid C# code compile?

4 Answers4