1

I have the following rows in my Excel input file:

Column1       Column2
0-5           3.040 
6             2.957 
7             2.876

and the following code which uses ADO.NET to read it:

string fileName = "input.xls";
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);
var dbConnection = new OleDbConnection(connectionString);
dbConnection.Open();
try
{
    var dbCommand = new OleDbCommand("SELECT * FROM [Sheet1$]", dbConnection);
    var dbReader = dbCommand.ExecuteReader ();

    while (dbReader.Read())
    {
        string col1 = dbReader.GetValue(0).ToString();
        string col2 = dbReader.GetValue(1).ToString();                 
    }                
}
finally
{
    dbConnection.Close();
}

The results are very disturbing. Here's why:

The values of each column in the first time through the loop:

col1 is empty (blank)

col2 is 3.04016411633586

Second time:

col1 is 6

col2 is 2.95722928448829

Third time:

col1 is 7

col2 is 2.8763272933077

The first problem happens with col1 in the first iteration. I expect 0-5. The second problem happens at every iteration with col2 where ADO.NET obviously alters the values as it reads them. How to stop this mal-behavior?

Kevin Le - Khnle
  • 10,579
  • 11
  • 54
  • 80
  • 2
    Are you sure that those values that ADO.Net is reading from col2 are not the actual values stored in the .xls file and that the values you expect are not just what Excel shows after rounding? Check the format of the cell and see if it is set to a numeric format with Decimal places set to 3. – Dr. Wily's Apprentice Jun 03 '10 at 19:14
  • Not sure if I followed your question fully, but everything I posted is the way it happened. ADO.NET alters the double values as it reads them. I shouldn't have to bother with format of the cell in Excel as I have no control over it. Users in this case are not programmers, so they won't know what a "data type" is. Pulling out *0-5* out of a column that otherwise contains mostly numbers is a problem too. – Kevin Le - Khnle Jun 03 '10 at 19:29
  • 2
    I guess what I meant was, are you sure that the value in row1 col2 is not actually 3.04016411633586 in the .xls file instead of 3.040? If that's the case, then ADO.Net is not actually changing the values, but rather it is giving you the exact value. – Dr. Wily's Apprentice Jun 03 '10 at 19:42

3 Answers3

4

By default, ADO.NET infers the column type based on the majority type in the first 8 rows, assumes the majority type wins if there are multiple types (and numeric in case of a tie) and returns NULL for any row that does not match its inferred data type for that column. See here for some notes on this and how to fix it. In a nutshell:

  • Add an "IMEX=1" attribute to your connection string.

  • In the registry, set the following keys:

Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/ImportMixedTypes = "Text" Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/TypeGuessRows = 0

This tells ADO to assume a text data type if it can't decide, and to scan the first 16384 rows (in previous versions of Excel, this was all rows...) to infer the type.

GalacticCowboy
  • 11,663
  • 2
  • 41
  • 66
  • Cowboy - Thanks I saw that note from lab49 too. Since this is for a web app, I don't have the luxury of messing with the registry on the production server. As lab49's author said, this is quite ugly. +1 thanks for trying to help. – Kevin Le - Khnle Jun 03 '10 at 20:18
  • No problem, I understand completely. I've run into some variation of this exact problem on several different projects, and it always comes down to the choice between implementing what is essentially a hack vs. going another direction entirely. Implementing the workaround is fairly simple (though you're modifying the underlying system so not to be taken lightly) but in one case they would have had to fix it on a couple hundred workstations and decided it simply wasn't worth the hassle. – GalacticCowboy Jun 03 '10 at 20:55
  • can you please tell if i want to scan all rows not 16384 rows @GalacticCowboy – Usman Asif Dec 11 '18 at 14:07
  • @UsmanAsif This was a limit of Excel. In previous versions you could not *have* more than 16384 rows. After they removed that limit, this feature still retained it. (Note that this answer is 8 years old, so there may be some changes since then, that I'm not familiar with.) – GalacticCowboy Dec 11 '18 at 20:05
  • @GalacticCowboy after eight years we have facing the same issue i am going to change my library to IExcel, – Usman Asif Dec 13 '18 at 05:32
1

The double type is an approximation of most numbers, and you're seeing more decimals of the approximation in your col2. Simply read it in as a double and convert it to a string representation with 3 digits after the decimal and they'll match.

string col1 = dbReader.GetString(0);
var col2Value = dbReader.GetDouble(1);
string col2 = col2Value.ToString("F3");
jball
  • 24,791
  • 9
  • 70
  • 92
  • I don't disagree with your approach. But ADO.NET thinks that the data type of column1 is DBTYPE_R8 (which is a double, which is also wrong). So the call *dbReader.GetString(0)* will throw an invalid cast exception. – Kevin Le - Khnle Jun 03 '10 at 18:32
  • I am not sure if that's even possible. Your suggestion is very much appreciated, but I think ADO.NET imposes the limitation here. – Kevin Le - Khnle Jun 03 '10 at 19:22
  • If you call `dbReader.IsDBNull(0)`, does it evaluate to true on the '0-5' row value? If so, I think you're right, the `OleDbReader` won't provide you with a value that doesn't match the column type. – jball Jun 03 '10 at 19:33
1

Try doing:

string col1 = dbReader.GetString(0); 
string col2 = dbReader.GetDecimal(1).ToString();  

Optionally you can format the ToString to fit display how you would like:

string col2 = dbReader.GetDecimal(1).ToString("0.000");  
Kelsey
  • 47,246
  • 16
  • 124
  • 162