13

I've noticed some bizarre behavior in my code when accidentally commenting out a line in a function during code review. It was very hard to reproduce but I'll depict a similar example here.

I've got this test class:

public class Test
{
    public void GetOut(out EmailAddress email)
    {
        try
        {
            Foo(email);
        }
        catch
        {
        }
    }

    public void Foo(EmailAddress email)
    {
    }
}

there is No assignment to Email in the GetOut which normally would throw an error:

The out parameter 'email' must be assigned to before control leaves the current method

However if EmailAddress is in a struct in a seperate assembly there is no error created and everything compiles fine.

public struct EmailAddress
{
    #region Constructors

    public EmailAddress(string email)
        : this(email, string.Empty)
    {
    }

    public EmailAddress(string email, string name)
    {
        this.Email = email;
        this.Name = name;
    }

    #endregion

    #region Properties

    public string Email { get; private set; }
    public string Name { get; private set; }

    #endregion
}

Why doesn't the compiler enforce that Email must be assign to? Why does this code compile if the struct is created in a separate assembly, but it doesn't compile if the struct is defined in the existing assembly?

johnny 5
  • 19,893
  • 50
  • 121
  • 195
  • 2
    If you're using a class you have to 'new' up an instance of the object. It's not required for structs. https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/using-structs (search for this text on that page specifically: Unlike classes, structs can be instantiated without using the new operato) – Dortimer Oct 30 '19 at 18:50
  • 1
    As soon as your Dog struct gets a variable it will not compile :) – André Sanson Oct 30 '19 at 18:51
  • In this example, with `struct Dog{}`, all is well. – H H Oct 30 '19 at 18:53
  • _"in the example I saw in code it has members"_ post that or it just didn't happen. – H H Oct 30 '19 at 18:53
  • Also, when you're using an `out` parameter with a class based object, `new` up the object before you pass it to the function: https://www.pluralsight.com/guides/csharp-in-out-ref-parameters – Dortimer Oct 30 '19 at 18:53
  • @AndréSanson That's not true I just cannot depict an example here with a member but I have an example in my code – johnny 5 Oct 30 '19 at 18:59
  • 2
    @johnny5 Then show the example. – André Sanson Oct 30 '19 at 19:05
  • @AndréSanson I cant the behavior is inconsistent between projects. I can provide a video – johnny 5 Oct 30 '19 at 19:11
  • @AndréSanson I think it's because the struct is in a seperate assembly but i'm still trying to determine the exact reason – johnny 5 Oct 30 '19 at 19:20
  • @AndréSanson the new example depicts correctly, you must create the struct in a seperate referenced assembly – johnny 5 Oct 30 '19 at 19:23
  • 1
    OK, this is interesting. Reproduced with a Core 3 Console app and a .Standard class lib. – H H Oct 30 '19 at 19:37
  • @johnny5 Yes I see now The error only raises when your struct gets another struct as a property. For example `public DateTime Date { get; set; }` or `public int Id { get; set; }`. Why this happens is beyond my understanding. – André Sanson Oct 30 '19 at 19:38
  • @AndréSanson thanks, it took me a while to figure out how to reproduce it. I personally have no idea what the rules are for the compiler with this. I was under the impression that all outs require initialization – johnny 5 Oct 30 '19 at 19:43
  • @HenkHolterman is this a bug? – johnny 5 Oct 30 '19 at 19:53
  • I'm not sure, it might be documented somewhere. It's not really a functional bug, just a missing compiler error. The members of EmailAddress will be set to `null` and stay that way. Just like after `new EmailAddress()`. – H H Oct 30 '19 at 20:26
  • @HenkHolterman, Interesting, I'll run some tests later tonight. I'm wondering if there could be a chance of adverse side effect. – johnny 5 Oct 30 '19 at 20:32

1 Answers1

19

TLDR: This is a known bug of long standing. I first wrote about it in 2010:

https://blogs.msdn.microsoft.com/ericlippert/2010/01/18/a-definite-assignment-anomaly/

It is harmless and you can safely ignore it, and congratulate yourself on finding a somewhat obscure bug.

Why doesn't the compiler enforce that Email must be definitely assigned?

Oh, it does, in a fashion. It just has a wrong idea of what condition implies that the variable is definitely assigned, as we shall see.

Why does this code compile if the struct is created in a separate assembly, but it doesn't compile if the struct is defined in the existing assembly?

That's the crux of the bug. The bug is a consequence of the intersection of how the C# compiler does definite assignment checking on structs and how the compiler loads metadata from libraries.

Consider this:

struct Foo 
{ 
  public int x; 
  public int y; 
}
// Yes, public fields are bad, but this is just 
// to illustrate the situation.
void M(out Foo f)
{

OK, at this point what do we know? f is an alias for a variable of type Foo, so the storage has already been allocated and is definitely at least in the state that it came out of the storage allocator. If there was a value placed in the variable by the caller, that value is there.

What do we require? We require that f be definitely assigned at any point where control leaves M normally. So you would expect something like:

void M(out Foo f)
{
  f = new Foo();
}

which sets f.x and f.y to their default values. But what about this?

void M(out Foo f)
{
  f = new Foo();
  f.x = 123;
  f.y = 456;
}

That should also be fine. But, and here is the kicker, why do we need to assign the default values only to blow them away a moment later? C#'s definite assignment checker checks to see if every field is assigned! This is legal:

void M(out Foo f)
{
  f.x = 123;
  f.y = 456;
}

And why should that not be legal? It's a value type. f is a variable, and it already contains a valid value of type Foo, so let's just set the fields, and we're done, right?

Right. So what's the bug?

The bug that you have discovered is: as a cost savings, the C# compiler does not load the metadata for private fields of structs that are in referenced libraries. That metadata can be huge, and it would slow down the compiler for very little win to load it all into memory every time.

And now you should be able to deduce the cause of the bug you've found. When the compiler checks to see if the out parameter is definitely assigned, it compares the number of known fields to the number of fields that were definite initialized and in your case it only knows about the zero public fields because the private field metadata was not loaded. The compiler concludes "zero fields required, zero fields initialized, we're good."

Like I said, this bug has been around for more than a decade and people like you occasionally rediscover it and report it. It's harmless, and it is unlikely to be fixed because fixing it is of almost zero benefit but a large performance cost.

And of course the bug does not repro for private fields of structs that are in source code in your project, because obviously the compiler already has information about the private fields at hand.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • @johnny5: You ought not to get errors. See https://dotnetfiddle.net/ZEKiUk. Can you post a simple repro? – Eric Lippert Oct 30 '19 at 21:10
  • 1
    Thanks for the fiddle it was because I defined x and y as properties instead of members – johnny 5 Oct 30 '19 at 21:15
  • 1
    @johnny5: If you just defined a normal C# 1.0 style property then from the definite assignment checker's perspective, that's a method, not a field. If you defined a C# 3.0+ style automatic property, the compiler knows that there is a private field backing it; the rules for definite assignment of that thing have been tweaked over the years and I do not now recall the exact rules. – Eric Lippert Oct 30 '19 at 21:19
  • If you use a `System.TimeSpan` instead, the errors do come: `error CS0269: Use of unassigned out parameter 'email'` and `error CS0177: The out parameter 'email' must be assigned to before control leaves the current method`. There is only one non-static field of `TimeSpan`, namely `_ticks`. It is `internal` to its assembly mscorlib. Is this assembly special? Same with `System.DateTime`, and its field is `private` – Jeppe Stig Nielsen Mar 17 '20 at 20:10
  • @JeppeStigNielsen: I don't know what's up with that! If you figure it out, please let me know. – Eric Lippert Mar 17 '20 at 21:31
  • The answer to that was already in your blog! The difference is that the field type is a value type. If I use another `struct` from mscorlib, like `System.Collections.DictionaryEntry`, the anomaly is there. If I declare (within a method) a local variable `System.Collections.DictionaryEntry de;` I can "use" it right away. It has two private fields `_key` and `_value` of reference type that are "unassigned" (I supposed they are guaranteed to be `null` references). It is irrelevant that public properties `Key` and `Value` exist with which I _could_ have changed the field values. – Jeppe Stig Nielsen Mar 18 '20 at 10:01
  • @JeppeStigNielsen: That is hilarious. Apparently I need to reread old posts more carefully. I write this stuff down so that I don't have to keep all the trivia in my head! – Eric Lippert Mar 18 '20 at 17:06