0

I need to export lots of data from database table to excel (xls/xlsx) file. It could be easily 10million rows and more.

I need open source solution which does not require Office to be installed (SpreadsheetGear and interop solutions will not work for me).

I am checking two libraries: OpenXML SDK and EPPlus.

For OpenXML SDK I found this method:

  private static void Write(string fileName, int numRows, int numCols)
        {
            using (var spreadsheetDocument = SpreadsheetDocument.Open(fileName, true))
            {
                WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
                WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

                string origninalSheetId = workbookPart.GetIdOfPart(worksheetPart);

                WorksheetPart replacementPart = workbookPart.AddNewPart<WorksheetPart>();
                string replacementPartId = workbookPart.GetIdOfPart(replacementPart);

                using (OpenXmlReader reader = OpenXmlReader.Create(worksheetPart))
                {
                    using (OpenXmlWriter writer = OpenXmlWriter.Create(replacementPart))
                    {
                        Row row = new Row();
                        Cell cell = new Cell();
                        //CellFormula cellFormula = new CellFormula();
                        //cellFormula.CalculateCell = true;
                        //cellFormula.Text = "RAND()";
                        //cell.Append(cellFormula);
                        CellValue cellValue = new CellValue("val val");
                        cell.Append(cellValue);

                        while (reader.Read())
                        {
                            if (reader.ElementType == typeof(SheetData))
                            {
                                if (reader.IsEndElement)
                                    continue;
                                writer.WriteStartElement(new SheetData());

                                for (int rowNumber = 0; rowNumber < numRows; rowNumber++)
                                {
                                    writer.WriteStartElement(row);
                                    for (int col = 0; col < numCols; col++)
                                    {
                                        writer.WriteElement(cell);
                                    }
                                    writer.WriteEndElement();
                                }

                                writer.WriteEndElement();
                            }
                            else
                            {
                                if (reader.IsStartElement)
                                {
                                    writer.WriteStartElement(reader);
                                }
                                else if (reader.IsEndElement)
                                {
                                    writer.WriteEndElement();
                                }
                            }
                        }
                    }
                }

                Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().First(s => s.Id.Value.Equals(origninalSheetId));
                sheet.Id.Value = replacementPartId;
                workbookPart.DeletePart(worksheetPart);
            }
        }

But it throws Out of memory exception. I need batch oriented approach and to be able to append data to the end of excel document. Unfortunately I did not find how to append rows with OpenXML SDK.

Also, I checked EPPlus soluion with LoadFromCollection method. It does support IDataReader with LoadFromDataReader but I dont have datareader at that point in code.

The question: is there a way to append data to existing sheet xls/xlsx file with kind of writer? Like OpenXMLWriter in OpenXML SDK.

UPD. Excel clearly does not support 10 million rows. Lets stick with 1m rows and lost of columns without out of memory exception.

UPD. Added EPPlus sample. 200k rows exports in 6 minutes and takes up to 1GB of RAM.

 private const string TempFile = @"C:\Users\vnechyp\Desktop\temp.xlsx";

private static void EPPlusExport()
{
    var random = new Random();

    var dt = new System.Data.DataTable();
    for (int i = 0; i < 15; i++)
    {
        dt.Columns.Add($"column_{i}");
    }

    var values = Enumerable.Range(0, 15).Select(val => random.Next().ToString()).ToArray();

    for (int i = 0; i < 10000; i++)
    {
        dt.Rows.Add(values);
    }

    using (ExcelPackage excelPackage = new ExcelPackage())
    {
        var workSheet = excelPackage.Workbook.Worksheets.Add("sheet");
        workSheet.Cells[1, 1].LoadFromDataTable(dt, true);
        excelPackage.SaveAs(new FileInfo(TempFile));
    }


    for (int i = 1; i < 50; i++)
    {
        Console.WriteLine($"Iteration: {i}");

        var updateRow = i*10000;
        Console.WriteLine($"Rows: {updateRow}");

        FileInfo existingFile = new FileInfo(TempFile);
        using (ExcelPackage excelPackage = new ExcelPackage(existingFile))
        {
            // get the first worksheet in the workbook
            ExcelWorksheet worksheet = excelPackage.Workbook.Worksheets[1];
            worksheet.Cells[updateRow, 1].LoadFromDataTable(dt, true);
            excelPackage.SaveAs(new FileInfo(TempFile));
        }
    }
}
makambi
  • 41
  • 1
  • 1
  • 5
  • I voted to close this as "Looking for a tutorial". I've used EPPlus several times for this type of operation. Their website has plenty of examples of continuing to write to an existing worksheet. You know your problem is that you have too much data to hold in memory and dump at once. Just continue to look through the documentation. – krillgar May 30 '16 at 13:54
  • Excel is not a suitable container for a dataset of that size. – J... May 30 '16 at 14:02
  • @krillgar Thank you, checking documentation. Last time I found nothing – makambi May 30 '16 at 14:05
  • @makambi Excel is also not really a suitable container for a million rows - it's an absolute limit for a reason. If you had a small project that grew to near that size it's a strong signal that your data needs to leave the nest and find a bigger home. *Starting* a project with the intention to dump a million rows into Excel, however, is just wrong. – J... May 30 '16 at 14:32
  • @J... I appreciate your concerns, the situation is following: customer wants export to csv and excel. I need to implement this, that it :-) – makambi May 30 '16 at 14:36
  • 2
    @makambi If they want a CSV that's fine - it can import into Access or another database and that's good. If they want an .xls or .xlsx file then it's your job to diplomatically tell them that they are wrong and in for a world of pain. This is not a workable solution, full stop. Excel can link to Access to view the data in pivot tables or whatever, but it *has* to go into a database when you start having this much data. A worksheet simply isn't going to cut it. This is like trying to load a shipping container onto a pickup truck - everything is going to break. – J... May 30 '16 at 15:21
  • @J... you are right but I still need a way to rxport xlxs:-) – makambi May 31 '16 at 07:28
  • 1
    @J... is right about Excel being the wrong tool for this job. That said, have a look at my [answer here](http://stackoverflow.com/a/32787219/3791802) which shows how to use the OpenXml SDK using the SAX approach. This writes 200k rows (as per your EPPus test) in around 22 seconds on my machine. – petelids May 31 '16 at 11:30

2 Answers2

3
+-----------------+-----------+--------------+---------------------+
|                 | Max. Rows | Max. Columns | Max. Cols by letter |
+-----------------+-----------+--------------+---------------------+
| Excel 365*      | 1,048,576 | 16,384       | XFD                 |
| Excel 2013      | 1,048,576 | 16,384       | XFD                 |
| Excel 2010      | 1,048,576 | 16,384       | XFD                 |
| Excel 2007      | 1,048,576 | 16,384       | XFD                 |
| Excel 2003      | 65,536    | 256          | IV                  |
| Excel 2002 (XP) | 65,536    | 256          | IV                  |
| Excel 2000      | 65,536    | 256          | IV                  |
| Excel 97        | 65,536    | 256          | IV                  |
| Excel 95        | 16,384    | 256          | IV                  |
| Excel 5         | 16,384    | 256          | IV                  |
+-----------------+-----------+--------------+---------------------+

You can use csv file for 10million rows

Alper Şaldırak
  • 1,034
  • 8
  • 10
2

My new favorite way to export to Excel, assuming CSV isn't sufficient, is to use the Open XML SDK. The below solution links to a great article by Vincent Tom on how to implement a large data export, with just his sample slightly cleaned up for new users.

Export a large data query (60k+ rows) to Excel

When I do this myself, I basically use an Open XML Writer, and a for each loop on an IQueryable. Never call ToList(), or you'll have to populate a list containing all the data in memory, which defeats the entire point.

For example, I've exported Excel files using this technique with 190,000+ records and 87+ columns, where other Excel libraries I've tried all failed.

Community
  • 1
  • 1
Greg
  • 2,410
  • 21
  • 26