0

I am working with a client to import a rather larger Excel file (over 37K rows) into a custom system and utilizing the excellent LinqToExcel library to do so. While reading all of the data in, I noticed it was breaking on records about 80% in and dug a little further. The reason it fails is the majority of records (with associated dates ranging 2011 - 2015) are normal, e.g. 1/3/2015, however starting in 2016, the structure changes to look like this: '1/4/2016 (note the "tick" at the beginning of the date) and LinqToExcel starts returning a DBNull for that column.

Any ideas on why it would do that and ways around it? Note that this isn't a casting issue - I can use the Immediate Window to see all the values of the LinqToExcel.Row value and where that column index is, it's empty.

Edit

Here is the code I am using to read in the file:

var excel = new LinqToExcel.ExcelQueryFactory(Path.Combine(this.FilePath, this.CurrentFilename));
foreach (var row in excel.Worksheet(file.WorksheetName))
{
    data.Add(this.FillEntity(row));
}

The problem I'm referring to is inside the row variable, which is a LinqToExcel.Row instance and contains the raw data from Excel. The values inside row all line up, with the exception of the column for the date which is empty.

** Edit 2 **

I downloaded the LinqToExcel code from GitHub and connected it to my project and it looks like the issue is even deeper than this library. It uses an IDataReader to read in all of the values and the cells in question that aren't being read are empty from that level. Here is the block of code from the LinqToExcel.ExcelQueryExecutorclass that is failing:

private IEnumerable<object> GetRowResults(IDataReader data, IEnumerable<string> columns)
    {
        var results = new List<object>();
        var columnIndexMapping = new Dictionary<string, int>();
        for (var i = 0; i < columns.Count(); i++)
            columnIndexMapping[columns.ElementAt(i)] = i;

        while (data.Read())
        {
            IList<Cell> cells = new List<Cell>();
            for (var i = 0; i < columns.Count(); i++)
            {
                var value = data[i];

                //I added this in, since the worksheet has over 37K rows and 
                //I needed to snag right before it hit the values I was looking for
                //to see what the IDataReader was exposing. The row inside the
                //IDataReader relevant to the column I'm referencing is null,
                //even though the data definitely exists in the Excel file
                if (value.GetType() == typeof(DateTime) && value.Cast<DateTime>() == new DateTime(2015, 12, 31))
                {
                }



                value = TrimStringValue(value);
                cells.Add(new Cell(value));
            }
            results.CallMethod("Add", new Row(cells, columnIndexMapping));
        }
        return results.AsEnumerable();
    }

Since their class uses an OleDbDataReader to retrieve the results, I think that is what can't find the value of the cell in question. I don't even know where to go from there.

Scott Salyer
  • 2,165
  • 7
  • 45
  • 82
  • 1
    Is it an option to clean the Excel spreadsheet so that the dates are valid? – mjwills Jul 25 '17 at 23:30
  • I'm trying to get a proper one from the client now, otherwise manually editing will be the only option and it's north of 5K records to fix. I'm more surprised this value can't be read in general than anything else. – Scott Salyer Jul 25 '17 at 23:35
  • It's not my code - it's part of the LinqToExcel library itself. That's what isn't reading the value properly. – Scott Salyer Jul 26 '17 at 02:19
  • 1
    Are you calling the LinqToExcel library? Could you show us _that_ code? It is very hard to give advice without seeing your code. Show us as much as possible (the code that calls LinqToExcel, your mappings, your class where the data is mapped into etc etc). – mjwills Jul 26 '17 at 03:44
  • Done! I added example code, though with this being that library I'm not sure whether that will help or not. – Scott Salyer Jul 26 '17 at 17:59
  • Again, it has nothing to do with the mapping - the `row` variable is a `LinqToExcel.Row` class - that is the one that doesn't have the value in it. If that was returning some data, I could take it from there easily, but it's empty. The code that calls LinqToExcel is already above - it opens the file (via `Excel.Worksheet`, which tells it what worksheet name to use) and loops through all the columns. If you're familiar with LinqToExcel, I know it supports binding via generics to a class, but I'm not using that. I read it directly from the file. – Scott Salyer Jul 27 '17 at 14:40
  • @mjwills - new code added. Looks like the OleDbDataReader is what's failing to read the value, so that makes it even more difficult to figure out. At least I know it's not the LinqToExcel library now though. – Scott Salyer Jul 27 '17 at 15:35

1 Answers1

0

Found it! Once I traced down that it was the OleDbDataReader that was failing and not the LinqToExcel library itself, it sent me down a different path to look around. Apparently, when an Excel file is read by an OleDbDataReader (as virtually all utilities do under the covers), the first few records are scanned to determine the type of content associated with the column. In my scenario, over 20K records had "normal" dates, so it assumed everything was a date. Once it got to the "bad" records, the ' in front of the date meant it couldn't be parsed into a date, so the value was null.

To circumvent this, I load the file and tell it to ignore column headers. Since the header for this column is a string and most of the values are dates, it makes everything a string because of the mismatched types and the values I need are loaded properly. From there, I can parse accordingly and get it to work.

Source: What is IMEX in the OLEDB connection string?

Scott Salyer
  • 2,165
  • 7
  • 45
  • 82
  • I thought LinqToExcel ignored the first row (the header row) by default. Am I wrong about that? – Rod Nov 28 '17 at 22:23
  • 1
    That's an optional setting, yep. The data on this side was so bad that I don't think any of this was a fault of LinqToExcel - it's how bad what I was supplied is. – Scott Salyer Nov 29 '17 at 16:28