5

When I used OLEDB, it takes only 2 - 3 seconds to read 3200 rows from an Excel Sheet. Now I changed to OpenXML format and now it takes more than 1 minute to read 3200 rows from an Excel Sheet.

Below is my code:

public static DataTable ReadExcelFileDOM(string filename) 
{ 
    DataTable table; 

    using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(filename, true)) 
    { 
        WorkbookPart workbookPart = myDoc.WorkbookPart; 
        Sheet worksheet = workbookPart.Workbook.Descendants<Sheet>().First(); 
        WorksheetPart worksheetPart = 
         (WorksheetPart)(workbookPart.GetPartById(worksheet.Id)); 
        SheetData sheetData = 
            worksheetPart.Worksheet.Elements<SheetData>().First(); 
        List<List<string>> totalRows = new List<List<string>>(); 
        int maxCol = 0; 

        foreach (Row r in sheetData.Elements<Row>()) 
        { 
            // Add the empty row. 
            string value = null; 
            while (totalRows.Count < r.RowIndex - 1) 
            { 
                List<string> emptyRowValues = new List<string>(); 
                for (int i = 0; i < maxCol; i++) 
                { 
                    emptyRowValues.Add(""); 
                } 
                totalRows.Add(emptyRowValues); 
            } 


            List<string> tempRowValues = new List<string>(); 
            foreach (Cell c in r.Elements<Cell>()) 
            { 
                #region get the cell value of c. 
                if (c != null) 
                { 
                    value = c.InnerText; 

                    // If the cell represents a numeric value, you are done.  
                    // For dates, this code returns the serialized value that  
                    // represents the date. The code handles strings and Booleans 
                    // individually. For shared strings, the code looks up the  
                    // corresponding value in the shared string table. For Booleans,  
                    // the code converts the value into the words TRUE or FALSE. 
                    if (c.DataType != null) 
                    { 
                        switch (c.DataType.Value) 
                        { 
                            case CellValues.SharedString: 
                                // For shared strings, look up the value in the shared  
                                // strings table. 
                                var stringTable = workbookPart. 
                                    GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); 

                                // If the shared string table is missing, something is  
                                // wrong. Return the index that you found in the cell. 
                                // Otherwise, look up the correct text in the table. 
                                if (stringTable != null) 
                                { 
                                    value = stringTable.SharedStringTable. 
                                        ElementAt(int.Parse(value)).InnerText; 
                                } 
                                break; 

                            case CellValues.Boolean: 
                                switch (value) 
                                { 
                                    case "0": 
                                        value = "FALSE"; 
                                        break; 
                                    default: 
                                        value = "TRUE"; 
                                        break; 
                                } 
                                break; 
                        } 
                    } 

                    Console.Write(value + "  "); 
                } 
                #endregion 

                // Add the cell to the row list. 
                int i = Convert.ToInt32(c.CellReference.ToString().ToCharArray().First() - 'A'); 

                // Add the blank cell in the row. 
                while (tempRowValues.Count < i) 
                { 
                    tempRowValues.Add(""); 
                } 
                tempRowValues.Add(value); 
            } 

            // add the row to the totalRows. 
            maxCol = processList(tempRowValues, totalRows, maxCol); 

            Console.WriteLine(); 
        } 

        table = ConvertListListStringToDataTable(totalRows, maxCol); 
    } 
    return table; 
} 

/// <summary> 
/// Add each row to the totalRows. 
/// </summary> 
/// <param name="tempRows"></param> 
/// <param name="totalRows"></param> 
/// <param name="MaxCol">the max column number in rows of the totalRows</param> 
/// <returns></returns> 
private static int processList(List<string> tempRows, List<List<string>> totalRows, int MaxCol) 
{ 
    if (tempRows.Count > MaxCol) 
    { 
        MaxCol = tempRows.Count; 
    } 

    totalRows.Add(tempRows); 
    return MaxCol; 
} 

private static DataTable ConvertListListStringToDataTable(List<List<string>> totalRows, int maxCol) 
{ 
    DataTable table = new DataTable(); 
    for (int i = 0; i < maxCol; i++) 
    { 
        table.Columns.Add(); 
    } 
    foreach (List<string> row in totalRows) 
    { 
        while (row.Count < maxCol) 
        { 
            row.Add(""); 
        } 
        table.Rows.Add(row.ToArray()); 
    } 
    return table; 
} 

Is there an efficient way to change this code somewhere so that the read process can be little faster. How I can change this to code to read faster. Thanks.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • I took your code and executed it to an excel file containing approx. 3,500 rows. Took 4 secs - what data is within your excel sheet? – Pilgerstorfer Franz Oct 09 '12 at 11:26
  • {$REGIONAL PREFIX},CITY,DATE,FUNCTION,LOCATION,KEY MAP,ORDER NUMBER,STATE,STREET NUMBER,STREET NAME,UNIT NUMBER ,SEND TO GIS,FILENAME - These mentioned are all headers. Values of one row is as follows: 61||32601,E,CUSHING,6/14/1966,X-SVCS,339,AR,111-421,W WALNUT-N CLEVELAND,X,||111-421 W WALNUT-N CLEVELAND.pdf –  Oct 09 '12 at 11:43
  • You might want to try EPPLUS to load the xlsx file. In my opionion, it is much faster. – Christian Sauer Oct 09 '12 at 12:04
  • tried it with your given details - 10 secs approx. in case you *remove your console.writeline stmts* it reduces time **to 1.26 sec !!!** – Pilgerstorfer Franz Oct 09 '12 at 12:49
  • @ChristianSauer, Thank You very much. EPPLUS is working fine as expected. Thanks. –  Oct 09 '12 at 13:33
  • @PilgerstorferFranz, Thanks for your support. I will check removing console.writeline stmts. Thanks –  Oct 09 '12 at 13:34
  • @PilgerstorferFranz, yes as you said, I tried by removing console.writeline stmts. It reads very fast. Thank You very much. –  Oct 09 '12 at 13:52

2 Answers2

1

I tried your code and noted that in an very easy example it took me approximately 4 secs to complete.

After editing my .xls file to your given details (columns: regional prefix, city, date, function, ...) and adding about 3,600 rows your code comes up to about 10 secs.

I think you should remove any Console.WriteLine statements as these ones slow down processing your xls file. After removing all of those my StopWatch showed 1.26 secs for the same number of rows.

You can find some reasons why console.WriteLine is so slow even on SO: Console.WriteLine slow. In this question there is an answer pointing to OutputDebugString...

Community
  • 1
  • 1
Pilgerstorfer Franz
  • 8,303
  • 3
  • 41
  • 54
1

I found some disadvantages in you code.

  1. When add to DataTable large number of rows use BeginLoadData and EndLoadData
  2. You need cache SharedStringTable
  3. You should use OpenXmlReader (SAX method). Memory consumption will be reduced.

You can try my ExcelDataReader without these disadvantages. See here https://github.com/gSerP1983/OpenXml.Excel.Data

Read to DataTable example:

class Program
{
    static void Main(string[] args)
    {
        var dt = new DataTable();
        using (var reader = new ExcelDataReader(@"data.xlsx"))
        {                
            dt.Load(reader);
        }

        Console.WriteLine("done: " + dt.Rows.Count);
        Console.ReadKey();
   }
}
Sergey Petrov
  • 121
  • 1
  • 4