4

I have an Excel 2007 workbook that contains tables of data that I'm importing into DataTable objects using ADO.NET.

Through some experimentation, I've managed to find two different ways to indicate that a cell should be treated as "null" by ADO.NET:

  1. The cell is completely blank.
  2. The cell contains #N/A.

Unfortunately, both of these are problematic:

  1. Most of my columns of data in Excel are generated via formulas, but it's not possible in Excel to generate a formula that results in a completely blank cell. And only a completely blank cell will be considered null (an empty string will not work).

  2. Any formula that evaluates to #N/A (either due to an actual lookup error or because the NA() function was used) will be considered null. This seemed like the ideal solution until I discovered that the Excel workbook must be open for this to work. As soon as you close the workbook, OLEDB suddenly starts seeing all those #N/As as strings. This causes exceptions like the following to be thrown when filling the DataTable:

    Input string was not in a correct format. Couldn't store <#N/A> in Value Column. Expected type is Int32.

Question: How can I indicate a null value via an Excel formula without having to have the workbook open when I fill the DataTable? Or what can be done to make #N/A values be considered null even when the workbook is closed?

In case it's important, my connection string is built using the following method:

var builder = new OleDbConnectionStringBuilder
{
    Provider = "Microsoft.ACE.OLEDB.12.0",
    DataSource = _workbookPath
};
builder.Add("Extended Properties", "Excel 12.0 Xml;HDR=Yes;IMEX=0");
return builder.ConnectionString;

(_workbookPath is the full path to the workbook).

I've tried both IMEX=0 and IMEX=1 but it makes no difference.

devuxer
  • 41,681
  • 47
  • 180
  • 292
  • I can't really see the question? Would it be possible for you to modify the formulars? – Cilvic Apr 21 '11 at 20:09
  • @Cilvic, Sorry, you're right. I'll add one. – devuxer Apr 21 '11 at 20:21
  • @Cilvic, And yes, I can definitely edit the formulas. What I need to know is what formula result will in null. Once I know that, I can edit the formulas to produce that value any time I need null. – devuxer Apr 21 '11 at 20:31
  • @Civic what is evaluating for null values later that makes them necessary? And is there a sample of the existing formula generating the #N/A values? – datatoo Apr 21 '11 at 23:03
  • @datatoo, an example formula: `=IF(A1="",NA(),A1)`. – devuxer Apr 21 '11 at 23:36
  • @datatoo, I'm not sure I understand your other question. Are you asking why I need null values in my database? – devuxer Apr 21 '11 at 23:44
  • @DanM - Does the field type of #N/A only get represented when the formatting of field is specific to a certain type? – Robbie Tapping Apr 22 '11 at 02:51
  • @Robbie, All my data in Excel is formatted as "General" at the moment. – devuxer Apr 22 '11 at 03:07
  • @DanM - Have you tried to Format Text fields as Text and Number Fields as Number, i know it might be a very simple and basic question, but i have a process that Converts Excel to CSV for me, and i was getting the #N/A coming through when i had the formatting incorrectly stated. To be honest i dont think #N/A is anything like NULL, and dont believe you can do anything apart from In your DataTable updating #N/A with NULL – Robbie Tapping Apr 22 '11 at 03:12
  • @Robbie, You're probably right...it's just strange to me that #N/A works perfectly when the workbook is open. Somehow, the way the data is transmitted changes when the workbook is closed. – devuxer Apr 22 '11 at 06:21
  • Could You check in the Script for the String #N/A ard replace it wich null? Maybe even treat the Wohle column as string and convert to it later. – Cilvic Apr 22 '11 at 07:55
  • 1
    Also you could try to keep the empty strings =IF(A1="","",A1) – Cilvic Apr 22 '11 at 07:57
  • @DanM I think Civic is right, leave the empty strings empty rather than making them #N/A, also do you have excel setup in options>calculation>save external link values ? If this is a large range calculation might be an issue. – datatoo Apr 22 '11 at 14:28
  • Does it help to treat the column as string (so that it can have #N/A and numbers converted to string)? Also, when importing it will require code to translate. i.e. if it is #N/A, assume it to be null, else cast it to an int32. – shahkalpesh Apr 22 '11 at 14:38
  • @Cilvic, @datatoo, I already know empty strings won't work. That was my whole point that a cell has to be "completely blank" (as in you have to clear the contents of the cell) for it to be treated as null. – devuxer Apr 22 '11 at 18:21
  • @datatoo, I do have "Save external link values" checked in my Excel options, but I don't have any external data in my workbook. – devuxer Apr 22 '11 at 18:23
  • And to those who are suggesting writing a script to check for #N/A and convert it to null, yes, I could do that, but scanning through every value of every table will definitely slow things down. I was hoping for a more elegant solution. – devuxer Apr 22 '11 at 18:25
  • There's a reason people compare strings for empty string OR null you know. You're going to have to set a dummy value or something. – jcolebrand Apr 23 '11 at 00:20

1 Answers1

7

You're hitting the brickwall that many very frustrated users of Excel are experiencing. Unfortunately Excel as a company tool is widespread and seems quite robust, unfortunately because each cell/column/row has a variant data type it makes it a nightmare to handle with other tools such as MySQL, SQL Server, R, RapidMiner, SPSS and the list goes on. It seems that Excel 2007/2010 is not very well supported and even more so when taking 32/64 bit versions into account, which is scandalous in this day and age.

The main problem is that when ACE/Jet access each field in Excel they use a registry setting 'TypeGuessRows' to determine how many rows to use to assess the datatype. The default for "Rows to Scan" is 8 rows. The registry setting 'TypeGuessRows' can specify an integer value from one (1) to sixteen (16) rows, or you can specify zero (0) to scan all existing rows. If you can't change the registry setting (such as in 90% of office environments) it makes life difficult as the rows to guess are limited to the first 8.

For example, without the registry change If the first occurrence of #N/A is within the first 8 rows then IMEX = 1 will return the error as a string "#N/A". If IMEX = 0 then an #N/A will return 'Null'.

If the first occurrence of #N/A is beyond the first 8 rows then both IMEX = 0 & IMEX = 1 both return 'Null' (assuming required data type is numeric).

With the registry change (TypeGuessRows = 0) then all should be fine.

Perhaps there are 4 options:

  1. Change the registry setting TypeGuessRows = 0

  2. List all possible type variations in the first 8 rows as 'dummy data' (eg memo fields/nchar(max)/ errors #N/A etc)

  3. Correct ALL data type anomalies in Excel

  4. Don't use Excel - Seriously worth considering!

Edit: Just to put the boot in :) another 2 things that really annoy me are; if the first field on a sheet is blank over the first 8 rows and you can't edit the registry setting then the whole sheet is returned as blank (Many fun conversations telling managers they're fools for merging cells!). Also, if in Excel 2007/2010 you have a department return a sheet with >255 columns/fields then you have huge problems if you need non-contiguous import (eg key in col 1 and data in cols 255+)

  • +1 and thanks. I think I've been able to avoid the whole "guesses data types using the first 8 rows issue" by importing the schema of my database into the `DataTable` prior to filling it with the Excel data. This doesn't solve the null issue, though. I've come up with a solution using com interop that seems to provide me with both the speed and flexibility I need. So far, it's actually faster than my OLEDB solution. Finally, I'm sticking with Excel for now because it's just such a good fit for the kind of work I do (and all my coworkers use it), but there's definitely room for improvement. – devuxer Apr 23 '11 at 01:57
  • Be careful.. using a schema is fine and the correct thing to do but the data/recordset returned from ACE/Jet does not necessarily match the schema you've defined. –  Apr 23 '11 at 02:16
  • True...I guess what I was thinking was that by using a schema, at least an exception would be thrown to indicate that the Excel-determined data type isn't correct. That said, using interop instead of OLEDB avoids the issue altogether. – devuxer Apr 23 '11 at 06:54
  • If you have a standard template being used a dirty workaround to help in the future would be to create a VBA Workbook_BeforeClose event and paste the data to a named/hidden import sheet as values and replace all cells containing #. –  Apr 23 '11 at 10:37