Why do local variables require initialization, but fields do not?

Question

If I create a bool within my class, just something like bool check, it defaults to false.

When I create the same bool within my method, bool check(instead of within the class), i get an error "use of unassigned local variable check". Why?

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/80459/discussion-on-question-by-nachime-why-do-local-variables-require-initialization). — Martijn Pieters, Jun 13 '15 at 13:25
The question is vague. Would "because the specification says so" be an acceptable answer? — Eric Lippert, Jun 13 '15 at 15:11
Because that's the way it was done in Java when they copied it. :P — Alvin Thompson, Jun 17 '15 at 04:55

Eric Lippert · Accepted Answer · 2015-06-13T15:34:12.173

Yuval and David's answers are basically correct; summing up:

Use of an unassigned local variable is a likely bug, and this can be detected by the compiler at low cost.
Use of an unassigned field or array element is less likely a bug, and it is harder to detect the condition in the compiler. Therefore the compiler makes no attempt to detect the use of an uninitialized variable for fields, and instead relies upon the initialization to the default value in order to make the program behavior deterministic.

A commenter to David's answer asks why it is impossible to detect the use of an unassigned field via static analysis; this is the point I want to expand upon in this answer.

First off, for any variable, local or otherwise, it is in practice impossible to determine exactly whether a variable is assigned or unassigned. Consider:

bool x;
if (M()) x = true;
Console.WriteLine(x);

The question "is x assigned?" is equivalent to "does M() return true?" Now, suppose M() returns true if Fermat's Last Theorem is true for all integers less than eleventy gajillion, and false otherwise. In order to determine whether x is definitely assigned, the compiler must essentially produce a proof of Fermat's Last Theorem. The compiler is not that smart.

So what the compiler does instead for locals is implements an algorithm which is fast, and overestimates when a local is not definitely assigned. That is, it has some false positives, where it says "I can't prove that this local is assigned" even though you and I know it is. For example:

bool x;
if (N() * 0 == 0) x = true;
Console.WriteLine(x);

Suppose N() returns an integer. You and I know that N() * 0 will be 0, but the compiler does not know that. (Note: the C# 2.0 compiler did know that, but I removed that optimization, as the specification does not say that the compiler knows that.)

All right, so what do we know so far? It is impractical for locals to get an exact answer, but we can overestimate not-assigned-ness cheaply and get a pretty good result that errs on the side of "make you fix your unclear program". That's good. Why not do the same thing for fields? That is, make a definite assignment checker that overestimates cheaply?

Well, how many ways are there for a local to be initialized? It can be assigned within the text of the method. It can be assigned within a lambda in the text of the method; that lambda might never be invoked, so those assignments are not relevant. Or it can be passed as "out" to anothe method, at which point we can assume it is assigned when the method returns normally. Those are very clear points at which the local is assigned, and they are right there in the same method that the local is declared. Determining definite assignment for locals requires only local analysis. Methods tend to be short -- far less than a million lines of code in a method -- and so analyzing the entire method is quite quick.

Now what about fields? Fields can be initialized in a constructor of course. Or a field initializer. Or the constructor can call an instance method that initializes the fields. Or the constructor can call a virtual method that initailizes the fields. Or the constructor can call a method in another class, which might be in a library, that initializes the fields. Static fields can be initialized in static constructors. Static fields can be initialized by other static constructors.

Essentially the initializer for a field could be anywhere in the entire program, including inside virtual methods that will be declared in libraries that haven't been written yet:

// Library written by BarCorp
public abstract class Bar
{
    // Derived class is responsible for initializing x.
    protected int x;
    protected abstract void InitializeX(); 
    public void M() 
    { 
       InitializeX();
       Console.WriteLine(x); 
    }
}

Is it an error to compile this library? If yes, how is BarCorp supposed to fix the bug? By assigning a default value to x? But that's what the compiler does already.

Suppose this library is legal. If FooCorp writes

public class Foo : Bar
{
    protected override void InitializeX() { } 
}

is that an error? How is the compiler supposed to figure that out? The only way is to do a whole program analysis that tracks the initialization static of every field on every possible path through the program, including paths that involve choice of virtual methods at runtime. This problem can be arbitrarily hard; it can involve simulated execution of millions of control paths. Analyzing local control flows takes microseconds and depends on the size of the method. Analyzing global control flows can take hours because it depends on the complexity of every method in the program and all the libraries.

So why not do a cheaper analysis that doesn't have to analyze the whole program, and just overestimates even more severely? Well, propose an algorithm that works that doesn't make it too hard to write a correct program that actually compiles, and the design team can consider it. I don't know of any such algorithm.

Now, the commenter suggests "require that a constructor initialize all fields". That's not a bad idea. In fact, it is such a not-bad idea that C# already has that feature for structs. A struct constructor is required to definitely-assign all fields by the time the ctor returns normally; the default constructor initializes all the fields to their default values.

What about classes? Well, how do you know that a constructor has initialized a field? The ctor could call a virtual method to initialize the fields, and now we are back in the same position we were in before. Structs don't have derived classes; classes might. Is a library containing an abstract class required to contain a constructor that initializes all its fields? How does the abstract class know what values the fields should be initialized to?

John suggests simply prohibiting calling methods in a ctor before the fields are initialized. So, summing up, our options are:

Make common, safe, frequently used programming idioms illegal.
Do an expensive whole-program analysis that makes the compilation take hours in order to look for bugs that probably aren't there.
Rely upon automatic initialization to default values.

The design team chose the third option.

Great answer, as usual. I have a question though: *Why not automatically assign default values to local variables as well?* In other words, why not make `bool x;` be equivalent to `bool x = false;` **even inside a method**? — durron597, Jun 23 '15 at 14:05
@durron597: Because experience has shown that forgetting to assign a value to a local is probably a bug. If its probably a bug *and* it is cheap and easy to detect, then there is good incentive to make the behavior either illegal or a warning. — Eric Lippert, Jun 23 '15 at 16:06
In the answer below by Yuval, it says local vars are automatically initialised to default values. Why do this automatic initialisation if the developer is forced to initialise the vars anyway? — David Klempfner, Mar 19 '21 at 07:57

score 27 · Answer 2 · edited Sep 22 '22 at 10:26

27

When I create the same bool within my method, bool check(instead of within the class), i get an error "use of unassigned local variable check". Why?

Because the compiler is trying to prevent you from making a mistake.

Does initializing your variable to false change anything in this particular path of execution? Probably not, considering default(bool) is false anyway, but it is forcing you to be aware that this is happening. The .NET environment prevents you from accessing "garbage memory", since it will initialize any value to their default. But still, imagine this was a reference type, and you'd pass an uninitialized (null) value to a method expecting a non-null, and get a NRE at runtime. The compiler is simply trying to prevent that, accepting the fact that this may sometimes result in bool b = false statements.

Eric Lippert talks about this in a blog post:

The reason why we want to make this illegal is not, as many people believe, because the local variable is going to be initialized to garbage and we want to protect you from garbage. We do in fact automatically initialize locals to their default values. (Though the C and C++ programming languages do not, and will cheerfully allow you to read garbage from an uninitialized local.) Rather, it is because the existence of such a code path is probably a bug, and we want to throw you in the pit of quality; you should have to work hard to write that bug.

Why doesn't this apply to a class field? Well, I assume the line had to be drawn somewhere, and local variables initialization are a lot easier to diagnose and get right, as opposed to class fields. The compiler could do this, but think of all the possible checks it would need to be making (where some of them are independent of the class code itself) in order to evaluate if each field in a class is initialized. I am no compiler designer, but I am sure it would be definitely harder as there are plenty of cases that are taken into account, and has to be done in a timely fashion as well. For every feature you have to design, write, test and deploy and the value of implementing this as opposed to the effort put in would be non-worthy and complicated.

edited Sep 22 '22 at 10:26

Charlieface

52,284
6
19
43

answered Jun 13 '15 at 08:33

Yuval Itzchakov

146,575
32
257
321

" imagine this was a reference type, and you'd pass this uninitialized object to a method expecting an initialized one" Did you mean: "imagine this was a reference type and you were passing the default (null) instead of the reference of an object"? – Deduplicator Jun 13 '15 at 21:05
@Deduplicator Yes. A method expecting a non-null value. Edited that part. Hope it's clearer now. – Yuval Itzchakov Jun 13 '15 at 21:11
I don't think that it is because of the drawn line. Every class supposes to have a constructor, at least the default constructor. So when you stick with the default constructor you will get default values (quiet transparent). When defining a constructor, you are expected or supposed to know what you are doing within it and what fields you want to be initialized in what way including knowledge of the default values. – Peter Jun 22 '15 at 17:48
On the opposite: A field within a method may by declared and assigned values to in different paths of execution. There may be exceptions that are easily to oversee until you look in the documentation of a framework you may use or even in other parts of the code you may not maintain. This can introduce a very complex path of execution. Therefore the compilers hint. – Peter Jun 22 '15 at 17:51
@Peter I didn't really understand your second comment. Regarding the first, there is no requirement to initialize any fields inside a constructor. It is a common *practice*. The compilers job isn't to enforce such a practice. You cannot rely on any implementation of a constructor running and say "alright, all fields are good to go". Eric elaborated plenty in his answer on the ways one can initialize a field of a class, and shows how it would take a *very long time* to compute all logical ways initialization. – Yuval Itzchakov Jun 22 '15 at 18:32
But that was what I meant. It's not an requirement neither does the compiler enforce it. Instead the developer who designed that class is responsible for initializing fields. And he should be aware of his responsibilities as designing and implementing classes means designing and implementing contracts. – Peter Jun 23 '15 at 17:15
To the second comment: Actually I don't know if C# supports RuntimeExceptions, which are exceptions which can be thrown without the nessecity to catch them. This really can introduce hidden complexity as the path of execution in the method is unknown. Also exceptions that have to been caugth in multiple-catch-statement can introduce complexity in the method, as they are sort of goto's from a point you will only know about when knowing which exception could be thrown where. – Peter Jun 23 '15 at 17:21
This can lead to situations where at a first glance a field in a method may seem to be initialized in any execution path of that method but under certain conditions will be not initialized. The try-with-resources in Java was introduced also because opened resources with operation that could throw exceptions on them could lead to not properly closing them because of the complexity of the execution path. – Peter Jun 23 '15 at 17:21
The link to the blog post is broken. – David Klempfner Mar 19 '21 at 07:51

score 26 · Answer 3 · edited Sep 22 '22 at 10:26

26

Why do local variables require initialization, but fields do not?

The short answer is that code accessing uninitialised local variables can be detected by the compiler in a reliable way, using static analysis. Whereas this isn't the case of fields. So the compiler enforces the first case, but not the second.

Why do local variables require initialization?

This is no more than a design decision of the C# language, as explained by Eric Lippert. The CLR and the .NET environment do not require it. VB.NET, for example, will compile just fine with uninitialised local variables, and in reality the CLR initialises all uninitialised variables to default values.

The same could occur with C#, but the language designers chose not to. The reason is that initialised variables are a huge source of bugs and so, by mandating initialisation, the compiler helps to cut down on accidental mistakes.

Why don't fields require initialization?

So why doesn't this compulsory explicit initialisation happen with fields within a class? Simply because that explicit initialisation could occur during construction, through a property being called by an object initializer, or even by a method being called long after the event. The compiler cannot use static analysis to determine if every possible path through the code leads to the variable being explicitly initialised before us. Getting it wrong would be annoying, as the developer could be left with valid code that won't compile. So C# doesn't enforce it at all and the CLR is left to automatically initialise fields to a default value if not explicitly set.

What about collection types?

C#'s enforcement of local variable initialisation is limited, which often catches developers out. Consider the following four lines of code:

string str;
var len1 = str.Length;
var array = new string[10];
var len2 = array[0].Length;

The second line of code won't compile, as it's trying to read an uninitialised string variable. The fourth line of code compiles just fine though, as array has been initialised, but only with default values. Since the default value of a string is null, we get an exception at run-time. Anyone who's spent time here on Stack Overflow will know that this explicit/implicit initialisation inconsistency leads to a great many "Why am I getting a “Object reference not set to an instance of an object” error?" questions.

edited Sep 22 '22 at 10:26

Charlieface

52,284
6
19
43

answered Jun 13 '15 at 08:29

David Arno

42,717
16
86
131

"The compiler cannot use static analysis to determine if every possible path through the code leads to the variable being explicitly initialised before us." I'm not convinced this is true. Can you post an example of a program that is resistant to static analysis? – John Kugelman Jun 13 '15 at 13:17
@JohnKugelman, consider the simple case of `public interface I1 { string str {get;set;} }` and a method `int f(I1 value) { return value.str.Length; }`. If this exists in a library, the compiler cannot know what that library will be linked to, thus whether the `set` will have been called before the `get`, The backing field might not be explicitly initialised, but it has to compile such code. – David Arno Jun 13 '15 at 13:37
That's true, but I wouldn't expect the error to be generated while compiling `f`. It would be generated when compiling the constructors. If you leave a constructor with a field possibly uninitialized, that'd be an error. There might also have to be restrictions on calling class methods and getters before all fields are initialized. – John Kugelman Jun 13 '15 at 14:00
@JohnKugelman: I will post an answer discussing the issue you raise. – Eric Lippert Jun 13 '15 at 14:40
4

That's not fair. We're trying to have a disagreement here! – John Kugelman Jun 13 '15 at 15:19
@JohnKugelman the local variable is when you try to use it before assigning to it. IF you apply the same logic to fields. Not using them before they are assigned. You end with at least an NP hard problem. Which you of course can solve but you might end having the compilation run forever and how long it will take is unknowable until it's completed – Rune FS Jun 16 '15 at 19:41
@RuneFS The same is true for local variables. Correct detection is impossible due to the Halting problem. It's not even NP-hard, it's impossible. For local variables the compiler errs on the side of sometimes warning you about code that you know to be safe. – John Kugelman Jun 16 '15 at 21:13
@JohnKugelman yes, you are right, I was imprecise. Since the scope of analysis is way more limited it's easier to constrain the problem to something that's not NP complete not hard. It's possible to implement a 100% correct algorithm for analysis in both situations, but in some scenarios it would be infeasible to execute (since it would take forever) and it's unknowable, which will take forever, so the compiler can't rule them out – Rune FS Jun 17 '15 at 06:22
"Whereas this isn't the case of fields. " - how can this not be done reliably by the compiler? That is a compiler capability/language design problem, not an algorithmic one. – luis.espinal Jun 22 '15 at 16:21

score 12 · Answer 4 · edited Mar 19 '21 at 06:54

Good answers above, but I thought I'd post a much simpler/shorter answer for people to lazy to read a long one (like me).

Class

class Foo {
    private string Boo;
    public Foo() { /** bla bla bla **/ }
    public string DoSomething() { return Boo; }
}

Property Boo may or may not have been initialized in the constructor. So when it finds return Boo; it doesn't assume that it's been initialized. It simply suppresses the error.

Function

public string Foo() {
   string Boo;
   return Boo; // triggers error
}

The { } characters define the scope of a block of code. The compiler walks the branches of these { } blocks keeping track of stuff. It can easily tell that Boo was not initialized. The error is then triggered.

Why does the error exist?

The error was introduced to reduce the number of lines of code required to make source code safe. Without the error the above would look like this.

public string Foo() {
   string Boo;
   /* bla bla bla */
   if(Boo == null) {
      return "";
   }
   return Boo;
}

From the manual:

The C# compiler does not allow the use of uninitialized variables. If the compiler detects the use of a variable that might not have been initialized, it generates compiler error CS0165. For more information, see Fields (C# Programming Guide). Note that this error is generated when the compiler encounters a construct that might result in the use of an unassigned variable, even if your particular code does not. This avoids the necessity of overly-complex rules for definite assignment.

Reference: https://msdn.microsoft.com/en-us/library/4y7h161d.aspx

Why do local variables require initialization, but fields do not?

4 Answers4

Class

Function

Why does the error exist?

Linked

Related