3

I'm using OpenXML to open a spreadsheet and loop through the rows of a spreadsheet. I have a linq query that returns all cells within a row. The linq query was ripped straight from a demo on the MSDN.

IEnumerable<String> textValues =
    from cell in row.Descendants<Cell>()
    where cell.CellValue != null
    select (cell.DataType != null
            && cell.DataType.HasValue
            && cell.DataType == CellValues.SharedString
            ? sharedString.ChildElements[int.Parse(cell.CellValue.InnerText)].InnerText
            : cell.CellValue.InnerText);

The linq query is great at returning all cells that have a value, but it doesn't return cells that don't have a value. This in turn makes it impossible to tell which cell is which. Let me explain a little more. Say for instance we have three columns in our spreadsheet: Name, SSN, and Address. The way this linq query works is it only returns those cells that have a value for a given row. So if there is a row of data that has "John", "", "173 Sycamore" then the linq query only returns "John" and "173 Sycamore" in the enumeration, which in turn makes it impossible for me to know if "173 Sycamore" is the SSN or the Address field.

Let me reiterate here: what I need is for all cells to be returned, and not just cells that contain a value.

I've tried to monkey the linq query in every way that I could think of, but I had no luck whatsoever (ie - removing the where clause isn't the trick). Any help would be appreciated. Thanks!

Jagd
  • 7,169
  • 22
  • 74
  • 107

3 Answers3

4

The OpenXML standard does not define placeholders for cells that don't have data. In other words, it's underlying storage in XML is sparse. You could work round this on one of two ways:

  1. Create a list of all "available" or "possible" cells (probably by using a CROSS JOIN type of operation) then "left" joining to the row.Descendants<Cell>() collection to see if the cell reference has a value
  2. Utilize a 3rd party tool such as ClosedXML or EPPlus as a wrapper around the Excel data and query their interfaces, which are much more developer-friendly.
Community
  • 1
  • 1
jklemmack
  • 3,518
  • 3
  • 30
  • 56
  • I've used ClosedXML for exporting Excel spreadsheets, but I didn't think it did importing at all. Does it? I even looked real quick today at their website earlier today and didn't see that it did (although it was a really quick look). I'll take a look at EPPlus though. Really though, every time I try to do anything with OpenXML I'm amazed at what seemingly should be simple ends up being anything but! – Jagd Jan 19 '12 at 00:22
  • If by importing you mean the ability to read and write data, then OpenXML and EPPlus can both do it. I agree that trying to do anything directly with OpenXML is a royal pain, but with tools like those above and the Document Explorer as part of the SDK, its soooooo much better than the old COM Interop we used to have to deal with! – jklemmack Jan 19 '12 at 02:31
  • 1
    ClosedXML worked great. I wish I'd just used it from the first instead of screwing around with OpenXML. Thanks again! – Jagd Jan 23 '12 at 18:19
3

With ClosedXML:

var wb = new XLWorkbook("YourWorkbook.xlsx");
var ws = wb.Worksheet("YourWorksheetName");
var range = ws.RangeUsed();
foreach(var row in range.Rows())
{
   // Do something with the row...
   // ...

   foreach(var cell in row.Cells())
   {
      // Now do something with every cell in the row
      // ...
   }
}
Manuel
  • 10,869
  • 14
  • 55
  • 86
0

The one way I recommend is to fill in all the null cells with blank data so they will be returned by your linq statement. See this answer for how to do that.

Community
  • 1
  • 1
amurra
  • 15,221
  • 4
  • 70
  • 87