I am trying to extract all text data from an Excel document in C# and am having performance issues. In the following code I open the Workbook, loop over all worksheets, and loop over all cells in the used range, extracting the text from each cell as I go. The problem is, this takes 14 seconds to execute.
public class ExcelFile
{
public string Path = @"C:\test.xlsx";
private Excel.Application xl = new Excel.Application();
private Excel.Workbook WB;
public string FullText;
private Excel.Range rng;
private Dictionary<string, string> Variables;
public ExcelFile()
{
WB = xl.Workbooks.Open(Path);
xl.Visible = true;
foreach (Excel.Worksheet CurrentWS in WB.Worksheets)
{
rng = CurrentWS.UsedRange;
for (int i = 1; i < rng.Count; i++)
{ FullText += rng.Cells[i].Value; }
}
WB.Close(false);
xl.Quit();
}
}
Whereas in VBA I would do something like this, which takes ~1 second:
Sub run()
Dim strText As String
For Each ws In ActiveWorkbook.Sheets
For Each c In ws.UsedRange
strText = strText & c.Text
Next c
Next ws
End Sub
Or, even faster (less than 1 second):
Sub RunFast()
Dim strText As String
Dim varCells As Variant
For Each ws In ActiveWorkbook.Sheets
varCells = ws.UsedRange
For i = 1 To UBound(varCells, 1)
For j = 1 To UBound(varCells, 2)
strText = strText & CStr(varCells(i, j))
Next j
Next i
Next ws
End Sub
Perhaps something is happening in the for loop in C# that I'm not aware of? Is it possible to load a range into an array-type object (as in my last example) to allow iteration over just the values, not the cell objects?