0

I have problem with loaded large Excel file into datatable. I tried many third-party software like NPOI and Epplus, but cant solve my problem. After I did many search , I found a suggestion that is useing Stream to load part of file at once. But this article just talk about concept , and I have no idea how to read part of file at once and combine parts of file togather and parse it.

I know there are plenty of third-party software to do this task. But I really want to know the most basic way. Can someone please give me some examples ?

劉鎮瑲
  • 517
  • 9
  • 20
  • OleDB could be useful. check this out : https://stackoverflow.com/questions/14261655/best-fastest-way-to-read-an-excel-sheet-into-a-datatable – Anoop J May 21 '18 at 06:51
  • Thanks. But according to the website you gived , seems like i need to install OleDB provider in local machine. I want a solution that wont have to install any software. Is it possible? – 劉鎮瑲 May 21 '18 at 07:05

3 Answers3

1

If you have large excel document then it would be better for you to use the Open XML SDK and for the smaller size files, I think the EPPLUS is better suited.

For EPPLUS

you can make use of the following code:

public DataTable GetDataTableFromExcel(string path)
{    
     var tbl = new DataTable();
     using (var pck = new OfficeOpenXml.ExcelPackage())
    {
   //reading the excel file using the stream
    using (var stream = File.OpenRead(path))
    {
      pck.Load(stream);
    }

    //Reading the data from the 1st sheet, you can add the code to read other sheets
    var ws = pck.Workbook.Worksheets.First();        
    //now adding the columns to the table and assuming the first row of the sheet is contaning columns if not change the we.Cells property
    foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column])
    {
       tbl.Columns.Add(firstRowCell.Text);
    }
                   //adding data to datatable
    for (int rowNum = 1; rowNum < ws.Dimension.End.Row; rowNum++)
    {
       var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column];
       DataRow row = tbl.Rows.Add();
       foreach (var cell in wsRow)
       {
          cell.Calculate();
          row[cell.Start.Column - 1] = cell.Value;
        }
    }
        return tbl;
}

For OPEN XML SDK

you can make use of the following code:

public DataTable GetDataTableFromExcel(string path)
    {
        var dataTable = new DataTable();
        using (SpreadsheetDocument doc = SpreadsheetDocument.Open(path, false))
        {
            //to read data from the 1st sheet
            Worksheet worksheet = SpreedsheetHelper.GetWorksheetPart(doc.WorkbookPart, "myFirstSheetname").Worksheet;
            SheetData sheetData = worksheet.GetFirstChild<SheetData>();
            IEnumerable<Row> rows = sheetData.Descendants<Row>();
            var cells = SpreedsheetHelper.GetRowCells(rows.ElementAt(0));
            //creating the columns
            foreach (Cell cell in cells)
            {
                var colname = SpreedsheetHelper.TryGetCellValue(doc, cell);
                colname = colname == null ? "" : colname;
                dataTable.Columns.Add(colname, SpreedsheetHelper.GetCellDatatype(cell));
            }               

           //adding data to datatable         
            foreach (Row row in rows)
            {
                DataRow dataRow = dataTable.NewRow();
                var rowcells = SpreedsheetHelper.GetRowCells(row);
                var cellindex = 0;
                foreach (Cell cell in rowcells)
                {
                    var value = SpreedsheetHelper.TryGetCellValue(doc, cell);
                    value = value == null ? "" : value;
                    dataRow[cellindex] = value;
                    cellindex++;
                }                    
                dataTable.Rows.Add(dataRow);
            }

        }
        //to handle the blank row added at the top of datatable
        dataTable.Rows.RemoveAt(0);

        return dataTable;
    }

Hope this would help.

vikscool
  • 1,293
  • 1
  • 10
  • 24
  • Thank you very much.I have tried both in my computer already. But the local machine in my company is not permitted to install extra software , so I have to use Visual Studio build-in function or class and so on to solve the problem. – 劉鎮瑲 May 21 '18 at 07:40
  • @劉鎮瑲 the `EEPLUS` and the `OPEN XML SDK` both are libraries and are free to download hence can be downloaded through the `NuGet` which is in build in your visual studio. If you still don't want to include any library then your best guess is with `interop excel`. – vikscool May 21 '18 at 07:57
  • Thanks. Openxml is a good tool. It solving my problem. But my file is nearly 25MB, I can't use the way you give me. I will get out of memory exception at `IEnumerable rows = sheetData.Descendants(); var cells =SpreedsheetHelper.GetRowCells(rows.ElementAt(0));`So,I vote up your answer for let me know this tool. I will post a way that deal with large file I find in this [doc](https://learn.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet). it's really solved my problem. – 劉鎮瑲 Dec 20 '18 at 07:10
0
  1. Install Latest version of NPOI from NUGET package manager.
  2. Use below method to read the excel stream into table. its well tested and working code. Pass the excel stream , table name and number of sheets to read
  public void ReadFromExcel(Stream excelFileStream, DataTable dt, int sheetsToRead, out string processingError)                    
        {

               XSSFWorkbook workbook = new XSSFWorkbook(excelFileStream); // write data in workbook from xls document.               
                    XSSFSheet sheet = (XSSFSheet)workbook.GetSheetAt(0); // read the current table data                
                    XSSFRow headerRow = (XSSFRow)sheet.GetRow(0); // read the current row data
                    // LastCellNum is the number of cells of current rows
                    int cellCount = headerRow.LastCellNum;
                    bool isBlanKRow = false;
                    processingError = "";
                    try
                    {
                        if (dt.Rows.Count == 0)
                        {
                           //Reading First Row as Header for Excel Sheet;
                            try
                            {
                                for (int j = headerRow.FirstCellNum; j < cellCount; j++)
                                {
                                    // get data as the column header of DataTable
                                    DataColumn column = new DataColumn(headerRow.GetCell(j).StringCellValue);
                                    dt.Columns.Add(column);
                                }
                            }
                            catch (Exception Ex)
                            {
                                logger.Error("Error", Ex);
                                processingError = Ex.Message;
                                throw;
                            }
                        }

                       // Number of Sheets to Read  
                        for (int sheetindex = 0; sheetindex < sheetsToRead; sheetindex++)
                        {
                            sheet = (XSSFSheet)workbook.GetSheetAt(sheetindex);
                            if (null != sheet)
                            {

                                // LastRowNum to get the number of rows of current table
                                int rowCount = sheet.LastRowNum + 1;
                                // Reading Rows and Copying it to Data Table;
                                try
                                {
                                    for (int i = (sheet.FirstRowNum + 1); i < rowCount; i++)
                                    {
                                        XSSFRow row = (XSSFRow)sheet.GetRow(i);
                                        DataRow dataRow = dt.NewRow();
                                        isBlanKRow = true;
                                        try
                                        {
                                            for (int j = row.FirstCellNum; j < cellCount; j++)
                                            {
                                                if (null != row.GetCell(j) && !string.IsNullOrEmpty(row.GetCell(j).ToString()) && !string.IsNullOrWhiteSpace(row.GetCell(j).ToString()))
                                                {
                                                    dataRow[j] = row.GetCell(j).ToString();
                                                    isBlanKRow = false;
                                                }
                                            }
                                        }
                                        catch (Exception Ex)
                                        {                                    
                                            processingError = Ex.Message;
                                            throw;
                                        }
                                        if (!isBlanKRow)
                                        {
                                            dt.Rows.Add(dataRow);
                                        }
                                    }
                                }
                                catch (Exception Ex)
                                {                           
                                    processingError = Ex.Message;
                                    throw;
                                }
                            }
                        }
                    }
                    catch (Exception Ex)
                    {               
                        processingError = Ex.Message;
                        throw;
                    }
                    finally
                    {
                        workbook.UnlockStructure();
                        workbook.UnlockRevision();
                        workbook.UnlockWindows();
                        workbook = null;
                        sheet = null;
                    }
                }
kumar chandraketu
  • 2,232
  • 2
  • 20
  • 25
0

I got a way to solve out of memory problem. My code can solve 100K-200K rows or 25MB excel file.

      SpreadsheetDocument doc = SpreadsheetDocument.Open(fullFilePath, false) //read file
        WorkbookPart x = doc.WorkbookPart;
        OpenXmlReader reader = OpenXmlReader.Create(workbookPart.WorksheetParts.First());
        int totalRow = 0;
        while (reader.Read())//read excel file every tag
        {
            if (reader.ElementType == typeof(Row))//start to deal with data when meet row tag
            {
                if (totalRow == 0)//i want to skip header row
                {
                    totalRow++;
                    reader.ReadFirstChild();//start reading the child element of row tag
                    do
                    {
                        if (reader.ElementType == typeof(Cell))//start to deal with the data in cell
                        {


                         Cell cell = (Cell)reader.LoadCurrentElement();//load into the element
                         //you can get data if you need header info
                            }
                        } while (reader.ReadNextSibling());//read another sibling cell tag. it will stop until the last sibling cell.
                    }
                    else
                {

                    reader.ReadFirstChild();
                    do
                    {
                        if (reader.ElementType == typeof(Cell))
                        {
                           Cell cell = (Cell)reader.LoadCurrentElement();
                           var container = GetValue(x, cell);// because not every data will directly store in cell tag. I have to deal with some situation in the GetValue function.
                        }
                    }while (reader.ReadNextSibling());
                }
            }
        }
        private string GetValue(WorkbookPart workbookPart, Cell cell)
        {
            var cellValue = cell.CellValue;

            string value = (cellValue == null) ? cell.InnerText : cellValue.InnerText;//get info in cell tag
            if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)//when info store in sharedstringtable you have to get info in there
            {
                return workbookPart.SharedStringTablePart.SharedStringTable.ChildElements.GetItem(int.Parse(value)).InnerText;
            }
            return value;
        }

The xml form of excel will like below. You can view the xml form of excel then you will know the way of deal with large excel more.

xml form

Reference:Sample(reading a very large Excel file) SAX approach

劉鎮瑲
  • 517
  • 9
  • 20