34

I am trying to read an Excel file into a list of Data.DataTable, although with my current method it can take a very long time. I essentually go Worksheet by Worksheet, cell by cell, and it tends to take a very long time. Is there a quicker way of doing this? Here is my code:

    List<DataTable> List = new List<DataTable>();

    // Counting sheets
    for (int count = 1; count < WB.Worksheets.Count; ++count)
    {
        // Create a new DataTable for every Worksheet
        DATA.DataTable DT = new DataTable();

        WS = (EXCEL.Worksheet)WB.Worksheets.get_Item(count);

        textBox1.Text = count.ToString();

        // Get range of the worksheet
        Range = WS.UsedRange;


        // Create new Column in DataTable
        for (cCnt = 1; cCnt <= Range.Columns.Count; cCnt++)
        {
            textBox3.Text = cCnt.ToString();


                Column = new DataColumn();
                Column.DataType = System.Type.GetType("System.String");
                Column.ColumnName = cCnt.ToString();
                DT.Columns.Add(Column);

            // Create row for Data Table
            for (rCnt = 0; rCnt <= Range.Rows.Count; rCnt++)
            {
                textBox2.Text = rCnt.ToString();

                try
                {
                    cellVal = (string)(Range.Cells[rCnt, cCnt] as EXCEL.Range).Value2;
                }
                catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException)
                {
                    ConvertVal = (double)(Range.Cells[rCnt, cCnt] as EXCEL.Range).Value2;
                    cellVal = ConvertVal.ToString();
                }

                // Add to the DataTable
                if (cCnt == 1)
                {

                    Row = DT.NewRow();
                    Row[cCnt.ToString()] = cellVal;
                    DT.Rows.Add(Row);
                }
                else
                {

                    Row = DT.Rows[rCnt];
                    Row[cCnt.ToString()] = cellVal;

                }
            }
        }
        // Add DT to the list. Then go to the next sheet in the Excel Workbook
        List.Add(DT);
    }
John Saunders
  • 160,644
  • 26
  • 247
  • 397
user1334858
  • 1,885
  • 5
  • 30
  • 39
  • 1
    " Is there a quicker way of doing this? Unfortunately not." Not true. This code is creating (and mistakenly failing to dispose of) a COM object for every single Excel cell value it reads in. This is the slowest possible way of doing it ! It's MUCH faster to read in the entire Worksheet into an array in one go, then iterate through the items in that array. – Mike Gledhill May 05 '17 at 07:40

7 Answers7

27

Caling .Value2 is an expensive operation because it's a COM-interop call. I would instead read the entire range into an array and then loop through the array:

object[,] data = Range.Value2;

// Create new Column in DataTable
for (int cCnt = 1; cCnt <= Range.Columns.Count; cCnt++)
{
    textBox3.Text = cCnt.ToString();

    var Column = new DataColumn();
    Column.DataType = System.Type.GetType("System.String");
    Column.ColumnName = cCnt.ToString();
    DT.Columns.Add(Column);

    // Create row for Data Table
    for (int rCnt = 1; rCnt <= Range.Rows.Count; rCnt++)
    {
        textBox2.Text = rCnt.ToString();

        string CellVal = String.Empty;
        try
        {
            cellVal = (string)(data[rCnt, cCnt]);
        }
        catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException)
        {
            ConvertVal = (double)(data[rCnt, cCnt]);
            cellVal = ConvertVal.ToString();
        }

        DataRow Row;

        // Add to the DataTable
        if (cCnt == 1)
        {

            Row = DT.NewRow();
            Row[cCnt.ToString()] = cellVal;
            DT.Rows.Add(Row);
        }
        else
        {

            Row = DT.Rows[rCnt + 1];
            Row[cCnt.ToString()] = cellVal;

        }
    }
} 
Community
  • 1
  • 1
D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • 3
    This still works perfectly. I had 40k records and the processing time went down from about 2 minutes to about 2 seconds. – Henrique Feijo Jul 29 '15 at 20:56
  • 2
    I am seriously confused about the variable usages in the answer. It doesn't seem to be noob friendly. 1. I could not use `Range.Value2`in that place, it shows error as "cannot implicitly convert object [] to object [*,*]". 2. I am not sure about the `Convertval` variable. – parkourkarthik Dec 17 '15 at 15:29
  • @parkourkarthik I can't verify right now but if your range is a single row or column you may get a 1-D `object[]` back, although I thought it was always a 2-d array. Feel free to ask that as a separate question if you haven't already. – D Stanley Dec 17 '15 at 16:29
  • @parkourkarthik `ConvertVal` came from the OP's original question so I can't say what it's purpose is. It doesn't seem to be relevant to the original problem though. – D Stanley Dec 17 '15 at 16:31
  • I can't seem to find all the declarations of all the variables used. – ThEpRoGrAmMiNgNoOb Feb 05 '16 at 09:51
  • @MichayCarpa - I added the missing declarations on there. – Leo Gurdian Jun 21 '16 at 15:08
  • @MichayCarpa The edits Leo proposed will break the code - you added variable declarations in inner scopes but the variables are used in outer scopes. The variables are not declared in the question either so they should not be "added" to the answer. If you are having trouble adapting the answer yo your specific case then ask that as a separate question. – D Stanley Jun 21 '16 at 15:22
  • @DStanley - that's exactly what Michay is saying; missing declarations because they are not in the OP's code. And also as parkourkarthik suggests.."It doesn't seem to be noob friendly." ... I was trying so that you could just slap on the code with minimal re-work but you're being defensive against this. This code as-is ..is pratically useless.. for example Column = new DataColumn(); ..compile error .. "Column is not defined." ... another one.. Row = DT.NewRow();... "Row is not defined." ...do you have a better suggestion other than open a new thread? Let's be constructive here. – Leo Gurdian Jun 22 '16 at 21:24
  • @LeoGurdian Well, I added some variable declaration to make it more "cut-and-paste" against my better judgment. To me, the heart of the answer is pulling the excel data into an array versus calling `Value` for every cell. There are some oddities in the code (`System.Type.GetType("System.String")`, etc.) that I did not address, but if someone wants to copy and paste the answer and use it as-is that's their right. – D Stanley Jun 22 '16 at 22:04
  • @DStanley, I agree with you; the root of the answer IS as you suggested; not to hit the COM. Your answer is very efficient and I've got to implement it on my own project. Thank you for sharing that great piece of knowledge and also for accommodating the other little syntax details. – Leo Gurdian Jun 24 '16 at 15:08
  • @DStanley, could you please look into this? https://stackoverflow.com/questions/66778315/convert-excel-cell-range-into-datatable-c-sharp – theLuckyOne Mar 25 '21 at 02:33
  • thanks @DStanley, It took me literrally a 1 second to parse through 13,000... instead of 9 minutes before – suisgrand Jul 01 '23 at 15:58
8

In case anyone else is using EPPlus. This implementation is pretty naive, but there are comments that draw attention to such. If you were to layer one more method GetWorkbookAsDataSet() on top it would do what the OP is asking for.

    /// <summary>
    /// Assumption: Worksheet is in table format with no weird padding or blank column headers.
    /// 
    /// Assertion: Duplicate column names will be aliased by appending a sequence number (eg. Column, Column1, Column2)
    /// </summary>
    /// <param name="worksheet"></param>
    /// <returns></returns>
    public static DataTable GetWorksheetAsDataTable(ExcelWorksheet worksheet)
    {
        var dt = new DataTable(worksheet.Name);
        dt.Columns.AddRange(GetDataColumns(worksheet).ToArray());
        var headerOffset = 1; //have to skip header row
        var width = dt.Columns.Count;
        var depth = GetTableDepth(worksheet, headerOffset);
        for (var i = 1; i <= depth; i++)
        {
            var row = dt.NewRow();
            for (var j = 1; j <= width; j++)
            {
                var currentValue = worksheet.Cells[i + headerOffset, j].Value;

                //have to decrement b/c excel is 1 based and datatable is 0 based.
                row[j - 1] = currentValue == null ? null : currentValue.ToString();
            }

            dt.Rows.Add(row);
        }

        return dt;
    }

    /// <summary>
    /// Assumption: There are no null or empty cells in the first column
    /// </summary>
    /// <param name="worksheet"></param>
    /// <returns></returns>
    private static int GetTableDepth(ExcelWorksheet worksheet, int headerOffset)
    {
        var i = 1;
        var j = 1;
        var cellValue = worksheet.Cells[i + headerOffset, j].Value;
        while (cellValue != null)
        {
            i++;
            cellValue = worksheet.Cells[i + headerOffset, j].Value;
        }

        return i - 1; //subtract one because we're going from rownumber (1 based) to depth (0 based)
    }

    private static IEnumerable<DataColumn> GetDataColumns(ExcelWorksheet worksheet)
    {
        return GatherColumnNames(worksheet).Select(x => new DataColumn(x));
    }

    private static IEnumerable<string> GatherColumnNames(ExcelWorksheet worksheet)
    {
        var columns = new List<string>();

        var i = 1;
        var j = 1;
        var columnName = worksheet.Cells[i, j].Value;
        while (columnName != null)
        {
            columns.Add(GetUniqueColumnName(columns, columnName.ToString()));
            j++;
            columnName = worksheet.Cells[i, j].Value;
        }

        return columns;
    }

    private static string GetUniqueColumnName(IEnumerable<string> columnNames, string columnName)
    {
        var colName = columnName;
        var i = 1;
        while (columnNames.Contains(colName))
        {
            colName = columnName + i.ToString();
            i++;
        }

        return colName;
    }
Derpy
  • 1,458
  • 11
  • 15
4

Please check out the below links

http://www.codeproject.com/Questions/376355/import-MS-Excel-to-datatable (6 solutions posted)

Best /Fastest way to read an Excel Sheet into a DataTable?

Community
  • 1
  • 1
Vimal CK
  • 3,543
  • 1
  • 26
  • 47
3

MS Office Interop is slow and even Microsoft does not recommend Interop usage on server side and cannot be use to import large Excel files. For more details see why not to use OLE Automation from Microsoft point of view.

Instead, you can use any Excel library, like EasyXLS for example. This is a code sample that shows how to read the Excel file:

ExcelDocument workbook = new ExcelDocument();
DataSet ds = workbook.easy_ReadXLSActiveSheet_AsDataSet("excel.xls");
DataTable dataTable = ds.Tables[0];

If your Excel file has multiple sheets or for importing only ranges of cells (for better performances) take a look to more code samples on how to import Excel to DataTable in C# using EasyXLS.

alex.pulver
  • 2,107
  • 2
  • 31
  • 31
2
Dim sSheetName As String
Dim sConnection As String
Dim dtTablesList As DataTable
Dim oleExcelCommand As OleDbCommand
Dim oleExcelReader As OleDbDataReader
Dim oleExcelConnection As OleDbConnection

sConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Test.xls;Extended Properties=""Excel 12.0;HDR=No;IMEX=1"""

oleExcelConnection = New OleDbConnection(sConnection)
oleExcelConnection.Open()

dtTablesList = oleExcelConnection.GetSchema("Tables")

If dtTablesList.Rows.Count > 0 Then
    sSheetName = dtTablesList.Rows(0)("TABLE_NAME").ToString
End If

dtTablesList.Clear()
dtTablesList.Dispose()

If sSheetName <> "" Then

    oleExcelCommand = oleExcelConnection.CreateCommand()
    oleExcelCommand.CommandText = "Select * From [" & sSheetName & "]"
    oleExcelCommand.CommandType = CommandType.Text

    oleExcelReader = oleExcelCommand.ExecuteReader

    nOutputRow = 0

    While oleExcelReader.Read

    End While

    oleExcelReader.Close()

End If

oleExcelConnection.Close()
Adil
  • 89
  • 4
1
 class DataReader
    {
        Excel.Application xlApp;
        Excel.Workbook xlBook;
        Excel.Range xlRange;
        Excel.Worksheet xlSheet;
        public DataTable GetSheetDataAsDataTable(String filePath, String sheetName)
        {
            DataTable dt = new DataTable();
            try
            {
                xlApp = new Excel.Application();
                xlBook = xlApp.Workbooks.Open(filePath);
                xlSheet = xlBook.Worksheets[sheetName];
                xlRange = xlSheet.UsedRange;
                DataRow row=null;
                for (int i = 1; i <= xlRange.Rows.Count; i++)
                {
                    if (i != 1)
                        row = dt.NewRow();
                    for (int j = 1; j <= xlRange.Columns.Count; j++)
                    {
                        if (i == 1)
                            dt.Columns.Add(xlRange.Cells[1, j].value);
                        else
                            row[j-1] = xlRange.Cells[i, j].value;
                    }
                    if(row !=null)
                        dt.Rows.Add(row);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
            finally
            {
                xlBook.Close();
                xlApp.Quit();
            }
            return dt;
        }
    }
0

I know it's an old topic, but there's an easy and fast way to do it by MiniExcel library like following code

    var filePath = @".....demo.xlsx";
    
    List<DataTable> list = new List<DataTable>();
    var sheets = MiniExcel.GetSheetNames(filePath);
    foreach (var sheetName in MiniExcel.GetSheetNames(filePath))
        list.Add(MiniExcel.QueryAsDataTable(filePath,true,sheetName:sheetName));

Best regards

enter image description here

Wei Lin
  • 3,591
  • 2
  • 20
  • 52