25

Task

Import data from excel to DataTable

Problem

The cell that doesnot contain any data are getting skipped and the very next cell that has data in the row is used as the value of the empty colum. E.g

A1 is empty A2 has a value Tom then while importing the data A1 get the value of A2 and A2 remains empty

To make it very clear I am providing some screen shots below

This is the excel data

enter image description here

This is the DataTable after importing the data from excel enter image description here

Code

public class ImportExcelOpenXml
{
    public static DataTable Fill_dataTable(string fileName)
    {
        DataTable dt = new DataTable();

        using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(fileName, false))
        {

            WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
            IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
            string relationshipId = sheets.First().Id.Value;
            WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
            Worksheet workSheet = worksheetPart.Worksheet;
            SheetData sheetData = workSheet.GetFirstChild<SheetData>();
            IEnumerable<Row> rows = sheetData.Descendants<Row>();

            foreach (Cell cell in rows.ElementAt(0))
            {
                dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
            }

            foreach (Row row in rows) //this will also include your header row...
            {
                DataRow tempRow = dt.NewRow();

                for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                {
                    tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
                }

                dt.Rows.Add(tempRow);
            }

        }

        dt.Rows.RemoveAt(0); //...so i'm taking it out here.

        return dt;
    }


    public static string GetCellValue(SpreadsheetDocument document, Cell cell)
    {
        SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
        string value = cell.CellValue.InnerXml;

        if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
        {
            return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
        }
        else
        {
            return value;
        }
    }
}

My Thoughts

I think there is some problem with

public IEnumerable<T> Descendants<T>() where T : OpenXmlElement;

In case I want the count of columns using Descendants

IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int colCnt = rows.ElementAt(0).Count();

OR

If I am getting the count of rows using Descendants

IEnumerable<Row> rows = sheetData.Descendants<<Row>();
int rowCnt = rows.Count();`

In both cases Descendants is skipping the empty cells

Is there any alternative of Descendants.

Your suggestions are highly appreciated

P.S: I have also thought of getting the cells values by using column names like A1, A2 but in order to do that I will have to get the exact count of columns and rows which is not possible by using Descendants function.

Taterhead
  • 5,763
  • 4
  • 31
  • 40
Vikas Bansal
  • 10,662
  • 14
  • 58
  • 100

4 Answers4

31

Had there been some data in all the cells of a row then everything works fine. But if you happen to have even single empty cell in a row then things go haywire.

Why it is happening in first place?

The reason lies in below line of code:

row.Descendants<Cell>().Count()

Count() function gives you the number of non-empty cells in the row i.e. it will ignore all the empty cells while returning the count. So, when you pass row.Descendants<Cell>().ElementAt(i) as argument to GetCellValue method like this:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

Then, it will find the content of the next non-empty cell, not necessarily the content of the cell at column index i e.g. if the first column is empty and we call ElementAt(1), it returns the value in the second column instead and our program logic gets messed up.

Solution: We need to deal with the occurrence of empty cells in the row i.e. we need to figure out the actual/effective column index of the target cell in case there were some empty cells before it in the given row. So, you need to substitute your for loop code below:

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
      tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
}

with

for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
    Cell cell = row.Descendants<Cell>().ElementAt(i);
    int actualCellIndex = CellReferenceToIndex(cell);
    tempRow[actualCellIndex] = GetCellValue(spreadSheetDocument, cell);
}

Also, add below method in your code which is used in the above modified code snippet to obtain the actual/effective column index of any cell:

private static int CellReferenceToIndex(Cell cell)
{
    int index = 0;
    string reference = cell.CellReference.ToString().ToUpper();
    foreach (char ch in reference)
    {
        if (Char.IsLetter(ch))
        {
            int value = (int)ch - (int)'A';
            index = (index == 0) ? value : ((index + 1) * 26) + value;
        }
        else
        {
            return index;
        }
    }
    return index;
}

Note: Index in an Excel row start with 1 unlike various programming languages where it starts at 0.

RBT
  • 24,161
  • 21
  • 159
  • 240
  • 5
    i think the method CellReferenceToIndex not okay for excel that exceeds a-z to aa,ab,... when z col exceed again it returns index from 0.... so if you have excel with many number of cols that wont work – Rouzbeh Zarandi Jan 27 '19 at 08:47
  • This is a very helpful answer! It's a such a shame the the OpenXML docs are such a clusterf*ck. Thank you very much. – rp. Apr 07 '22 at 15:17
6
public void Read2007Xlsx()
{
    try
    {
        DataTable dt = new DataTable();
        using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(@"D:\File.xlsx", false))
        {
            WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
            IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
            string relationshipId = sheets.First().Id.Value;
            WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
            Worksheet workSheet = worksheetPart.Worksheet;
            SheetData sheetData = workSheet.GetFirstChild<SheetData>();
            IEnumerable<Row> rows = sheetData.Descendants<Row>();
            foreach (Cell cell in rows.ElementAt(0))
            {
                dt.Columns.Add(GetCellValue(spreadSheetDocument, cell));
            }
            foreach (Row row in rows) //this will also include your header row...
            {
                DataRow tempRow = dt.NewRow();
                int columnIndex = 0;
                foreach (Cell cell in row.Descendants<Cell>())
                {
                    // Gets the column index of the cell with data
                    int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));
                    cellColumnIndex--; //zero based index
                    if (columnIndex < cellColumnIndex)
                    {
                        do
                        {
                            tempRow[columnIndex] = ""; //Insert blank data here;
                            columnIndex++;
                         }
                         while (columnIndex < cellColumnIndex);
                     }//end if block
                     tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);                            
                     columnIndex++;
                 }//end inner foreach loop
                 dt.Rows.Add(tempRow);
             }//end outer foreach loop
        }//end using block
        dt.Rows.RemoveAt(0); //...so i'm taking it out here.
    }//end try
    catch (Exception ex)
    {
    }
}//end Read2007Xlsx method

/// <summary>
/// Given a cell name, parses the specified cell to get the column name.
/// </summary>
/// <param name="cellReference">Address of the cell (ie. B2)</param>
/// <returns>Column Name (ie. B)</returns>
public static string GetColumnName(string cellReference)
{
    // Create a regular expression to match the column name portion of the cell name.
    Regex regex = new Regex("[A-Za-z]+");
    Match match = regex.Match(cellReference);
    return match.Value;
} //end GetColumnName method

/// <summary>
/// Given just the column name (no row index), it will return the zero based column index.
/// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). 
/// A length of three can be implemented when needed.
/// </summary>
/// <param name="columnName">Column Name (ie. A or AB)</param>
/// <returns>Zero based index if the conversion was successful; otherwise null</returns>
public static int? GetColumnIndexFromName(string columnName)
{                       
    //return columnIndex;
    string name = columnName;
    int number = 0;
    int pow = 1;
    for (int i = name.Length - 1; i >= 0; i--)
    {
        number += (name[i] - 'A' + 1) * pow;
        pow *= 26;
    }
    return number;
 } //end GetColumnIndexFromName method

public static string GetCellValue(SpreadsheetDocument document, Cell cell)
{
   SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
   if (cell.CellValue ==null)
   {
       return "";
   }
   string value = cell.CellValue.InnerXml;
   if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
   {
        return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
    }
    else
    {
         return value;
    }
 }//end GetCellValue method
RBT
  • 24,161
  • 21
  • 159
  • 240
0

Try this code. I have done little modifications and it worked for me:

public static DataTable Fill_dataTable(string filePath)
{
    DataTable dt = new DataTable();

    using (SpreadsheetDocument doc = SpreadsheetDocument.Open(filePath, false))
    {
        Sheet sheet = doc.WorkbookPart.Workbook.Sheets.GetFirstChild<Sheet>();
        Worksheet worksheet = doc.WorkbookPart.GetPartById(sheet.Id.Value) as WorksheetPart.Worksheet;
        IEnumerable<Row> rows = worksheet.GetFirstChild<SheetData>().Descendants<Row>();
        DataTable dt = new DataTable();
        List<string> columnRef = new List<string>();
        foreach (Row row in rows)
        {
            if (row.RowIndex != null)
            {
                if (row.RowIndex.Value == 1)
                {
                    foreach (Cell cell in row.Descendants<Cell>())
                    {
                        dt.Columns.Add(GetValue(doc, cell));
                            columnRef.Add(cell.CellReference.ToString().Substring(0, cell.CellReference.ToString().Length - 1));
                     }
                }
                else
                {
                    dt.Rows.Add();
                    int i = 0;
                    foreach (Cell cell in row.Descendants<Cell>())
                    {
                        while (columnRef(i) + dt.Rows.Count + 1 != cell.CellReference)
                        {
                            dt.Rows(dt.Rows.Count - 1)(i) = "";
                            i += 1;
                         }

                         dt.Rows(dt.Rows.Count - 1)(i) = GetValue(doc, cell);
                         i += 1;
                    }
                }
            }
        }
    }

    return dt;
}
RBT
  • 24,161
  • 21
  • 159
  • 240
0
foreach (Cell cell in row.Descendants<Cell>())
{
    while (columnRef[i] + (dt.Rows.Count + 1) != cell.CellReference)
    {
        dt.Rows[dt.Rows.Count - 1][i] = "";
        i += 1;
    }

    dt.Rows[dt.Rows.Count - 1][i] = GetValue(doc, cell);
    i++;
}
Presi
  • 806
  • 10
  • 26
  • Please add a description to your code. Please don't post raw code answers. – Presi Dec 23 '22 at 12:08
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 23 '22 at 12:08