C# Excel Interop Slow when looping through cells

Question

I am trying to extract all text data from an Excel document in C# and am having performance issues. In the following code I open the Workbook, loop over all worksheets, and loop over all cells in the used range, extracting the text from each cell as I go. The problem is, this takes 14 seconds to execute.

public class ExcelFile
{
    public string Path = @"C:\test.xlsx";
    private Excel.Application xl = new Excel.Application();
    private Excel.Workbook WB;
    public string FullText;
    private Excel.Range rng;
    private Dictionary<string, string> Variables;
    public ExcelFile()
    {
        WB = xl.Workbooks.Open(Path);
        xl.Visible = true;
        foreach (Excel.Worksheet CurrentWS in WB.Worksheets)
        {
            rng = CurrentWS.UsedRange;
            for (int i = 1; i < rng.Count; i++)
            { FullText += rng.Cells[i].Value; }
        }
        WB.Close(false);
        xl.Quit();
    }
}

Whereas in VBA I would do something like this, which takes ~1 second:

Sub run()
    Dim strText As String
    For Each ws In ActiveWorkbook.Sheets
        For Each c In ws.UsedRange
            strText = strText & c.Text
        Next c
    Next ws
End Sub

Or, even faster (less than 1 second):

Sub RunFast()
    Dim strText As String
    Dim varCells As Variant
    For Each ws In ActiveWorkbook.Sheets
        varCells = ws.UsedRange
        For i = 1 To UBound(varCells, 1)
            For j = 1 To UBound(varCells, 2)
                strText = strText & CStr(varCells(i, j))
            Next j
        Next i
    Next ws
End Sub

Perhaps something is happening in the for loop in C# that I'm not aware of? Is it possible to load a range into an array-type object (as in my last example) to allow iteration over just the values, not the cell objects?

That is not unusual, VBA runs in-process but your first snippet runs out-of-process. Crossing the process boundary is slow. Use arrays so you reduce the number of round-trips or an in-process solution based on OpenXML like EPPlus or ClosedXML. — Hans Passant, Mar 04 '17 at 23:45
Thanks Hans - the question then remains how do I load a range into an array to avoid the round-trip? — pwwolff, Mar 04 '17 at 23:58

score 5 · Answer 1 · answered Mar 05 '17 at 03:46

I use this function. The loops are only for converting to array starting at index 0, the main work is done in object[,] tmp = range.Value.

public object[,] GetTable(int row, int col, int width, int height)
{
    object[,] arr = new object[height, width];

    Range c1 = (Range)Worksheet.Cells[row + 1, col + 1];
    Range c2 = (Range)Worksheet.Cells[row + height, col + width];
    Range range = Worksheet.get_Range(c1, c2);

    object[,] tmp = range.Value;

    for (int i = 0; i < height; ++i)
    {
        for (int j = 0; j < width; ++j)
        {
            arr[i, j] = tmp[i + tmp.GetLowerBound(0), j + tmp.GetLowerBound(1)];
        }
    }                 

    return arr;
}

score 5 · Accepted Answer · answered Mar 05 '17 at 11:39

Excel and C# run in different environments completely. C# runs in the .NET framework using managed memory while Excel is a native C++ application and runs in unmanaged memory. Translating data between these two (a process called "marshaling") is extremely expensive in terms of performance.

Tweaking your code isn't going to help. For loops, string construction, etc. are all blazingly fast compared to the marshaling process. The only way you are going to get significantly better performance is to reduce the number of trips that have to cross the interprocess boundary. Extracting data cell by cell is never going to get you the performance you want.

Here are a couple options:

Write a sub or function in VBA that does everything you want, then call that sub or function via interop. Walkthrough.
Use interop to save the worksheet to a temporary file in CSV format, then open the file using C#. You will need to loop through and parse the file to get it into a useful data structure, but this loop will go much faster.
Use interop to save a range of cells to the clipboard, then use C# to read the clipboard directly.

4. (as hinted in other answers), read all the values in one go as an [,] if you don't need to make changes that retain formatting 5. use get_Value() with parameters to get an XML that you can edit including formatting — Jbjstam, Mar 05 '17 at 13:20

Max Weinzierl · Answer 3 · 2017-03-04T23:49:48.720

One thing which will speed it up is to use a StringBuilder instead of += on the previous string. Strings are immutable in C# and therefore you are creating a ton of extra strings during your process of creating the final string.

Additionally you may improve performance looping over the row, column positions instead of looping over the index.

Here is the code changed with a StringBuilder and row, column positional looping:

public class ExcelFile
{
    public string Path = @"C:\test.xlsx";
    private Excel.Application xl = new Excel.Application();
    private Excel.Workbook WB;
    public string FullText;
    private Excel.Range rng;
    private Dictionary<string, string> Variables;
    public ExcelFile()
    {
        StringBuilder sb = new StringBuilder();
        WB = xl.Workbooks.Open(Path);
        xl.Visible = true;

        foreach (Excel.Worksheet CurrentWS in WB.Worksheets)
        {
            rng = CurrentWS.UsedRange;
            for (int i = 1; i <= rng.Rows.Count; i++)
            {
                for (int j = 1; j <=  rng.Columns.Count; j++)
                {
                    sb.append(rng.Cells[i, j].Value); 
                }
            }
        }
        FullText = sb.ToString();
        WB.Close(false);
        xl.Quit();
    }
}

Thanks for the tip - shaved off 2 seconds (down to 12), so it seems to have been a factor - and generally a good thing to keep in mind. — pwwolff, Mar 04 '17 at 23:55

JohnG · Answer 4 · 2017-03-06T08:05:23.840

I sympathize with you pwwolff. Looping through Excel cells can be expensive. Antonio and Max are both correct but John Wu's answer sums it up nicely. Using string builder may speed things up and making an object array from the used range IMHO is about as fast as you are going to get using interop. I understand there are other third party libraries that may perform better. Looping through each cell will take an unacceptable amount of time if the file is large using interop.

On the tests below I used a workbook with a single sheet where the sheet has 11 columns and 100 rows of used range data. Using an object array implementation this took a little over a second. With 735 rows it took around 40 seconds.

I put 3 buttons on a form with a multi line text box. The first button uses your posted code. The second button takes the ranges out of the loops. The third button uses an object array approach. Each one has a significant performance improvement over the other. I used a text box on the form to output the data, you can use a string as you are but using a string builder would be better if you must have one big string.

Again, if the files are large you may want to consider another implementation. Hope this helps.

private void button1_Click(object sender, EventArgs e) {
  Stopwatch sw = new Stopwatch();
  MessageBox.Show("Start DoExcel...");
  sw.Start();
  DoExcel();
  sw.Stop();
  MessageBox.Show("End DoExcel...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds");
 }

private void button2_Click(object sender, EventArgs e) {
  MessageBox.Show("Start DoExcel2...");
  Stopwatch sw = new Stopwatch();
  sw.Start();
  DoExcel2();
  sw.Stop();
  MessageBox.Show("End DoExcel2...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds");
}

private void button3_Click(object sender, EventArgs e) {
  MessageBox.Show("Start DoExcel3...");
  Stopwatch sw = new Stopwatch();
  sw.Start();
  DoExcel3();
  sw.Stop();
  MessageBox.Show("End DoExcel3...Took: " + sw.Elapsed.Seconds + " seconds and " + sw.Elapsed.Milliseconds + " Milliseconds");
}

// object[,] array implementation
private void DoExcel3() {
  textBox1.Text = "";
  string Path = @"D:\Test\Book1 - Copy.xls";
  Excel.Application xl = new Excel.Application();
  Excel.Workbook WB;
  Excel.Range rng;

  WB = xl.Workbooks.Open(Path);
  xl.Visible = true;
  int totalRows = 0;
  int totalCols = 0;
  foreach (Excel.Worksheet CurrentWS in WB.Worksheets) {
    rng = CurrentWS.UsedRange;
    totalCols = rng.Columns.Count;
    totalRows = rng.Rows.Count;
    object[,] objectArray = (object[,])rng.Cells.Value;
    for (int row = 1; row < totalRows; row++) {
      for (int col = 1; col < totalCols; col++) {
        if (objectArray[row, col] != null)
          textBox1.Text += objectArray[row,col].ToString();
      }
      textBox1.Text += Environment.NewLine;
    }
  }
  WB.Close(false);
  xl.Quit();
  Marshal.ReleaseComObject(WB);
  Marshal.ReleaseComObject(xl);
}

// Range taken out of loops
private void DoExcel2() {
  textBox1.Text = "";
  string Path = @"D:\Test\Book1 - Copy.xls";
  Excel.Application xl = new Excel.Application();
  Excel.Workbook WB;
  Excel.Range rng;

  WB = xl.Workbooks.Open(Path);
  xl.Visible = true;
  int totalRows = 0;
  int totalCols = 0;
  foreach (Excel.Worksheet CurrentWS in WB.Worksheets) {
    rng = CurrentWS.UsedRange;
    totalCols = rng.Columns.Count;
    totalRows = rng.Rows.Count;
    for (int row = 1; row < totalRows; row++) {
      for (int col = 1; col < totalCols; col++) {
        textBox1.Text += rng.Rows[row].Cells[col].Value;
      }
      textBox1.Text += Environment.NewLine;
    }
  }
  WB.Close(false);
  xl.Quit();
  Marshal.ReleaseComObject(WB);
  Marshal.ReleaseComObject(xl);
}

// original posted code
private void DoExcel() {
  textBox1.Text = "";
  string Path = @"D:\Test\Book1 - Copy.xls";
  Excel.Application xl = new Excel.Application();
  Excel.Workbook WB;
  Excel.Range rng;

  WB = xl.Workbooks.Open(Path);
  xl.Visible = true;
  foreach (Excel.Worksheet CurrentWS in WB.Worksheets) {
    rng = CurrentWS.UsedRange;
    for (int i = 1; i < rng.Count; i++) {
      textBox1.Text += rng.Cells[i].Value;
    }
  }
  WB.Close(false);
  xl.Quit();
  Marshal.ReleaseComObject(WB);
  Marshal.ReleaseComObject(xl);
}

Thanks for the coherent examples! So now I'm very aware of the problems inherent in using COM libraries... and Marshalling :) — pwwolff, Mar 05 '17 at 14:28

C# Excel Interop Slow when looping through cells

4 Answers4

Linked