Introduction
First, let me explain that this is not as much a question as it is a useful tip through my short time using OpenXml. However, all comments are welcome. Also, this is posted as a question because I could not figure out how to post a "suggestion" or "best practice" post.
The code taken below is directly from the MSDN do-it-yourself documentation. MSDN Do-it-yourself
...
// If the cell does not exist, return an empty string:
if (theCell != null)
{
value = theCell.InnerText;
// If the cell represents a numeric value, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and Booleans
// individually. For shared strings, the code looks up the
// corresponding value in the shared string table. For Booleans,
// the code converts the value into the words TRUE or FALSE.
if (theCell.DataType != null)
{
switch (theCell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the shared
// strings table.
var stringTable = wbPart.
GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
// If the shared string table is missing, something is
// wrong. Return the index that you found in the cell.
// Otherwise, look up the correct text in the table.
if (stringTable != null)
{
value = stringTable.SharedStringTable.
ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
switch (value)
{
case "0":
value = "FALSE";
break;
default:
value = "TRUE";
break;
}
break;
}
}
...
The Bottleneck
Now, lets look at this line in particular. If you are simply reading a very large excel document and extracting data from the cells in the document, then this line will cause a performance bottleneck since it will iterate the entire SharedStringTable every time you try to get the string value from a cell.
value = stringTable.SharedStringTable.
ElementAt(int.Parse(value)).InnerText;
An Optimized Solution
Insert the following field in your class.
private OpenXmlElement[] sharedStringArray;
And once you have opened your XmlSpreadSheet, initialize the array once. Of course, this approach will use more memory, but I have found that the amount of memory used is far less valuable than the execution time with the MSDN approach.
var stringTable = wbPart.
GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
this.sharedStringArray = stringTable.SharedStringTable.ToArray();
Now we may reference directly to the array instead of iterating all the objects in the enumeration.
value = this.sharedStringArray[int.Parse(value)].InnerText;
From this approach, I am assuming that the shared string table will not be modified. I am also assuming that you may be getting the value of many million cells. This is a simple modification to help you achieve the execution time desired by forfeiting a small amount of memory.
I hope you enjoy!