I am using the following code to take some data (in XML like format - Not well formed) from a .txt
file and then write it to an .xlsx
using EPPlus after doing some processing. StreamElements
is basically a modified XmlReader
. My question is about performance, I have made a couple of changes but don't see what else I can do. I'm going to use this for large datasets so I'm trying to modify to make this as efficient and fast as possible. Any help will be appreciated!
I tried using p.SaveAs()
to do the excel writing but it did not really see a performance difference. Are there better faster ways to do the writing? Any suggestions are welcome.
using (ExcelPackage p = new ExcelPackage())
{
ExcelWorksheet ws = p.Workbook.Worksheets[1];
ws.Name = "data1";
int rowIndex = 1; int colIndex = 1;
foreach (var element in StreamElements(pa, "XML"))
{
var values = element.DescendantNodes().OfType<XText>()
.Select(v => Regex.Replace(v.Value, "\\s+", " "));
string[] data = string.Join(",", values).Split(',');
data[2] = toDateTime(data[2]);
for (int i = 0; i < data.Count(); i++)
{
if (rowIndex < 1000000)
{
var cell1 = ws.Cells[rowIndex, colIndex];
cell1.Value = data[i];
colIndex++;
}
}
rowIndex++;
}
}
ws.Cells[ws.Dimension.Address].AutoFitColumns();
Byte[] bin = p.GetAsByteArray();
using (FileStream fs = File.OpenWrite("C:\\test.xlsx"))
{
fs.Write(bin, 0, bin.Length);
}
}
}
Currently, for it to do the processing and then write 1 Million lines into an Excel worksheet, it takes about ~30-35 Minutes.