2

I'm trying to read some data from excel file with mixed format in same column. When I get the value as text format with connection string bellow

using (OleDbConnection cn = new OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + physicalPath + "';Extended Properties='Excel 12.0 Xml;HDR=NO;IMEX=1;TypeGuessRows=0;';"))

it shows "3.2240411747e+011" as text instead of "322404117474"

How can I get the proper value with proper data type?

Debashis
  • 81
  • 2
  • 9
  • 1
    My personal suggestion would be to use a "real" Excel library instead of OleDb's unreliable Excel support. I personally like [SpreadsheetLight](http://spreadsheetlight.com/), but there are lots of [other free libraries](https://stackoverflow.com/q/151005/87698) available. Since this is probably not the answer you are looking for, I'll leave this as a comment instead of an answer. – Heinzi Jan 24 '18 at 14:13
  • I would also suggest to use a "real library", there's a very nice "built-in" library: https://msdn.microsoft.com/library/microsoft.office.interop.excel(v=vs.140).aspx and https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/interop/how-to-access-office-onterop-objects – Essigwurst Jan 24 '18 at 14:18
  • See this answer - as far as I can see TypeGuessRows is a registry setting only and has no effect in the connection string : https://stackoverflow.com/questions/9936646/reading-excel-intermixed-datatype-without-modifying-registry-key also here https://www.connectionstrings.com/ace-oledb-12-0/treating-data-as-text/ says the IMEX=1 treats all mixed data as text. – PaulF Jan 24 '18 at 15:00
  • What is your goal? You start off implying that you want to retrieve all values as a string representation of the value, but are dissatisfied with the default formatting of the representation. Then you state: `How can I get the proper value with proper data type?` which is a totally different objective. Either objective can be achieved. – TnTinMn Jan 24 '18 at 15:58
  • Hi Heinzi, Thanks for your reply. I think Excel library instead of OleDb's consumes much memory in case of 100000 or 200000 row with 20 to 50 columns. Am I right? – Debashis Jan 25 '18 at 07:28
  • Hi TnTinMn, Thanks for your reply. I'm trying to read all the data as string and then I convert it to proper datatype. – Debashis Jan 25 '18 at 07:37

1 Answers1

0

I'm trying to read all the data as string and then I convert it to proper datatype.

it shows "3.2240411747e+011" as text instead of "322404117474"

To a large extent, this issue can by mitigated by constructing the SQL select statement to format the string created by the ACE provider. There appears to be an error in way the provider determines significant figures when it converts a Double to a String.

A Double can hold up to 15 significant figures. The value "322404117474" is 12 digits long, but its string representation "3.2240411747e+011" only has 11 digits. What I believe is happening is that four character positions taken up by the signed exponent are falsely being considered significant. This appears to only be an issue for cells having the default "General" format with an integer part more than 11 digits long. If the cells have a number format applied to them, the accuracy displayed by that format will be used. Therefore, you should expect loss of accuracy when importing the values as strings.

However, before I go into the SQL statement, I want to address the inclusion of TypeGuessRows=0 in the Extended Properties section of the connection string.

Simply put, this wrong. The TypeGuessRows value is always read from the registry. For a dated but good write-up on the parameter, see: External Data – Mixed Data Types.

You do not show the SQL statement you are using, but in its simplest form it would look like:

Select * From [sheetName$]

This wildcard "Select" is problematic when trying to read the field values as strings. If the field is a mixed numeric/string column that is reinterpreted as string, the retrieved strings will formatted in accordance with the cell formatting.

Assume the Worksheet being retrieved looks like this:

enter image description here

Rows 1, 2, and 4 contain numeric values. Row 5 is a number entered as text and Row 3 is obviously text. When this is retrieved using the above SQL (and HDR=NO) then and displayed in DataGridView, the result is:

enter image description here

The retrieval of values format this way can be an obvious problem if you need to parse them back to a numeric type.

When using the Jet or ACE OleDB provider to read Excel files, you can use several VBA functions that are available in MS Access in the SQL query. If the VBA Format Function is used in the Select statement like this:

Select F1, Format(F2) As F2 From [sheetName$]

The the retrieved values will look like this:

enter image description here

The field names "F1" and "F2" are automatically assigned by the provider when there is no header row specified to provide the names. The numbering is sequential from 1 to the number of columns found in the table.

Community
  • 1
  • 1
TnTinMn
  • 11,522
  • 3
  • 18
  • 39