The documentation says that:
The following code segment is used to read a very large Excel
file using the DOM approach.
and then goes an example. I use it to implement reading a relatively large file with 700K rows. I have this code by now:
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(path, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
// no other code
}
When I start my program, I see how quickly - just in five seconds - it runs out of memory (>1G). And the debugger points to this line of code:
SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First();
So, I need to know whether OpenXML
really helps to read large files. And, if not, what are the alternatives (Interop does not help - I've already checked it).
EDIT
One extra mysterious thing. This code I get by now:
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
while (reader.Read())
{
if (reader.ElementType == typeof(Row))
{
count++;
}
}
gives me in the count
variable over than a million of rows. However, I do have 14K on the first sheet and 700K on the second sheet. It is very strange. So, my extra question is how to parse only rows with data using SAX
approach. And one final mystery of reading large Excel files on OpenXML. One guy in this thread says that: "Turns out that the worksheets are enumerated backwards for some reason (so the first of my three sheets is actually index 3". So, my final extra question is how to get the sheet you want. At this moment I use this code:
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
But taking into account what the say, I'm not sure that in my case I would actually get the first worksheet.