4

I am trying to build a simulation tool in Excel using VSTO and by creating a Visual Studio 2010 Office workbook project. One of the worksheets in this workbook will contain approximately half a million records. Ideally I would like to read all of the records use them in the simulation and then output back some statistics. So far I had OutOfMemory exceptions when I tried to get the whole range and then the cells out of it in one go. Does anyone have other ideas as to how I can read all of the data or suggestions in doing this?

This is my code:

Excel.Range range = Globals.shData.Range["A2:AX500000"];

Array values = (Array)range.Cells.Value;

Dimitris
  • 2,030
  • 3
  • 27
  • 45
  • Did you receive the exception by assigning the range to an array? – Zairja Apr 30 '12 at 14:30
  • Yes I did. I got back the Cells.Value property in a System.Array – Dimitris Apr 30 '12 at 14:32
  • Add more memory to the machine? – NotMe Apr 30 '12 at 14:37
  • 1
    I would try the solutions offered here: http://stackoverflow.com/questions/4629712/simple-array-cause-exception This may also be relevant: http://stackoverflow.com/questions/1847789/segmentation-fault-on-large-array-sizes – Zairja Apr 30 '12 at 14:44
  • This is also useful in understanding size / memory limits, though in this case it's referring to C++ http://stackoverflow.com/questions/216259/is-there-a-max-array-length-limit-in-c/ – Zairja Apr 30 '12 at 14:45

2 Answers2

8

How about fetching in batches, and assembling a slightly less memory heavy model in memory?

var firstRow = 2;
var lastRow = 500000;
var batchSize = 5000;
var batches = Enumerable
    .Range(0, (int)Math.Ceiling( (lastRow-firstRow) / (double)batchSize ))
    .Select(x => 
        string.Format(
            "A{0}:AX{1}",
            x * batchSize + firstRow,
            Math.Min((x+1) * batchSize + firstRow - 1, lastRow)))
    .Select(range => ((Array)Globals.shData.Range[range]).Cells.Value);

foreach(var batch in batches)
{
    foreach(var item in batch)
    {
        //reencode item into your own object collection.
    }
}
spender
  • 117,338
  • 33
  • 229
  • 351
2

This is not an Excel problem, rather a general C# issue. Instead of gathering all the rows in memory, yield the rows and calculate the stats iteratively.

For example

class Program
{
    static void Main(string[] args)
    {
        var totalOfAllAges = 0D;
        var rows = new ExcelRows();

        //calculate various statistics
        foreach (var item in rows.GetRow())
        {
            totalOfAllAges += item.Age;
        }

        Console.WriteLine("The total of all ages is {0}", totalOfAllAges);
    }
}

internal class ExcelRows
{
    private double rowCount = 1500000D;
    private double rowIndex = 0D;

    public IEnumerable<ExcelRow> GetRow()
    {
        while (rowIndex < rowCount)
        {
            rowIndex++;
            yield return new ExcelRow() { Age = rowIndex };
        }
    }
}
/// <summary>
/// represents the next read gathered by VSTO
/// </summary>

internal class ExcelRow
{
    public double Age { get; set; }
}
P.Brian.Mackey
  • 43,228
  • 68
  • 238
  • 348
  • I am actually thinking to store the data in csv and load it using ADO so that I can use a cursor and run the simulation for a chunk at a time. – Dimitris Apr 30 '12 at 15:02
  • This has the same net effect as the cursor and avoids all the extra work with another file. – CCondron Aug 26 '12 at 03:50