8

I'm building an application which uses relatively large tables to do its work (LR tables, to be precise). As I'm generating code anyway and the table isn't that large, I decided to serialize my table by generating code that uses the C# collection initializer syntax to initialize the table on startup of my generated program:

public static readonly int[,] gotoTable = new int[,]
{
    {
        0,1,0,0,0,0,0,0,0,0,0,0,0,0,(...)
    },
    {
        0,0,4,0,5,6,0,0,0,0,0,7,0,0,(...)
    },
    (...)

Oddly enough, when I generated a table that had only a couple hundred thousand entries, the application that I generated crashes with a StackOverflowException on startup. The C# compiler compiles it just fine; the table generation application also runs just fine. In fact, when I switched to Release mode, the application did start up. An OutOfMemoryException might have made some sense, but even then the table I use is way to small for an OutOfMemoryException.

Code to reproduce this:

Warning: trying the code below in release mode crashed Visual Studio 2010 for me; watch out for losing unsaved work. Additionally, if you generate code for which the compiler generates lots of errors, Visual Studio will hang as well.

//Generation Project, main.cs:
using (StreamWriter writer = new StreamWriter("../../../VictimProject/Tables.cs"))
{
    writer.WriteLine("using System;");
    writer.WriteLine("public static class Tables");
    writer.WriteLine("{");
    writer.WriteLine("    public static readonly Tuple<int>[] bigArray = new Tuple<int>[]");
    writer.WriteLine("    {");
    for (int i = 0; i < 300000; i++)
        writer.WriteLine("        new Tuple<int>(" + i + "),");
    writer.WriteLine("    };");
    writer.WriteLine("}");
}
//Victim Project, main.cs:
for (int i = 0; i < 1234; i++)
{
    // Preventing the jitter from removing Tables.bigArray
    if (Tables.bigArray[i].Item1 == 10)
        Console.WriteLine("Found it!");
}
Console.ReadKey(true);

Run the first project for the Tables.cs file, and then the second program to get the StackOverflowException. Note that the above crashes on my computer: it might not on different platforms etc; try increasing 300000 if it doesn't.

Using release mode instead of debug mode seems to increase the limit slightly, as my project doesn't crash in release mode. However, the code above crashes in both modes for me.

Using literal ints or strings instead of Tuple<int>s doesn't cause the crash, nor does "new int()" (but that might get converted into a literal 0). Using a struct with a single int field does cause the crash. It seems to be related to using a constructor as initializer.

My guess is that the collection initializer is somehow implemented recursively, which would explain the stack overflow. However, that is a very weird thing to do as an iterative solutions seems a lot simpler and more efficient. The C# compiler itself doesn't have any problems with the program and compiles it very fast (it handles even larger collections well, but it does crash on positively huge collections, as expected).

I guess there's probably some way to write my table directly to a binary file and then link that file, but I haven't looked at that yet.

I guess I have two questions: why does the above happen, and how do I work around it?

Edit: some interesting details after disassembling the .exe:

.maxstack  4
.locals init ([0] class [mscorlib]System.Tuple`1<int32>[] CS$0$0000)
IL_0000:  ldc.i4     0x493e0
IL_0005:  newarr     class [mscorlib]System.Tuple`1<int32>
IL_000a:  stloc.0
IL_000b:  ldloc.0
IL_000c:  ldc.i4.0
IL_000d:  ldc.i4.0
IL_000e:  newobj     instance void class [mscorlib]System.Tuple`1<int32>::.ctor(!0)
IL_0013:  stelem.ref
IL_0014:  ldloc.0
IL_0015:  ldc.i4.1
IL_0016:  ldc.i4.1
IL_0017:  newobj     instance void class [mscorlib]System.Tuple`1<int32>::.ctor(!0)
IL_001c:  stelem.ref
(goes on and on)

This suggests that the jitter indeed crashes with a stack overflow trying to jit this method. Still, it's weird that it does, and in particular, that I get an exception out of it.

Alex ten Brink
  • 899
  • 2
  • 9
  • 19

2 Answers2

10

why does the above happen

I suspect it may be the JIT crashing. You will be generating an enormous type initializer (.cctor member in IL). Each value is going to be 5 IL instructions. I'm not entirely surprised a member with 1.5 million instructions causes problems...

and how do I work around it?

Include the data into an embedded resource file instead, and load it in the type initializer if you need to. I'm assuming this is generated data - so put data where it belongs, in a binary file rather than as literal code.

Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I'd be surprised to see a native stack overflow crash inside the CLR to be converted to a plain old StackOverflowException. I'd expect the process to fail() or an ExecutionEngineException or a BadProgramException. – usr Apr 12 '12 at 22:41
  • @usr What stack do you think is overflowing when you have `class Program { static void Main() { Main(); } }`? It's not the IL evaluation stack--that is purely virtual. – phoog Apr 12 '12 at 22:54
  • I'd maybe expect the jitted stack to overflow. Not the stack inside of the jitter. The explanation suggested in this answer is entirely possible, but it is not especially likely by any means. Lets not treat this as the truth but as speculation without much evidence. – usr Apr 12 '12 at 23:00
9

If it tries to pre-push all those onto the stack, that it going to need a mass of stack space, so personally I would indeed expect stack-overflow here, depending on how the compiler does it.

Having done something similar before (something that breaks every tool like reflector, because the IL is too big), my advice from experience is: do that via serialization, not via c#. In my case I did pretty much exactly that via protobuf-net, i.e.

  • generated the model (without data) as code
  • executed it to populate the model from the database
  • serialized it to a file
  • shipped the file with my deployment
  • deserialized during initialisation

But - I seem to recall having this discussion recently; if it was with yourself, then I stand entirely by my previous remarks. The way you are trying to do it is still problematic. The above approach (from direct experience) works very well. As IL? Not so much.

Note: If you absolutely wanted to write the file without the execute step, that is possible too - just trickier.

Kirk Woll
  • 76,112
  • 22
  • 180
  • 195
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • According to the IL it is not pushing all of that onto the stack. It is storing the elements one-by-one into the array. – usr Apr 12 '12 at 22:40
  • @usr where is it getting the elements from? They are probably stored on the stack. – phoog Apr 12 '12 at 22:52
  • They are stored on the stack, one by one. Constant space is enough. – usr Apr 12 '12 at 23:00
  • @usr I haven't looked at the IL (on mobile) - how many locals does it declare, perchance? – Marc Gravell Apr 12 '12 at 23:19
  • @MarcGravell: I've just used ildasm on the .exe produced, and it declares exactly 1 local variable of type Tuple. It's .maxstack is 4. – Alex ten Brink Apr 13 '12 at 00:16