COM-based data access to Excel specifications are likely buried in nearly inaccessible Microsoft archive documentation. (Usually served as one enormous PDF)
The connectionstring has some parts:
Provider: It is the main oledb provider that is used to open the Excel
sheet. This will be Microsoft.Jet.OLEDB.4.0 for Excel 97 onwards Excel
file format and Microsoft.ACE.OLEDB.12.0 for Excel 2007 or higher
Excel file format (One with xlsx extension)
Data Source: It is the entire path of the Excel workbook. You need to mention a dospath that corresponds to an Excel file. Thus, it will
look like: Data Source=C:\testApp.xls".
Extended Properties (Optional): Extended properties can be applied to Excel workbooks which may change the overall activity of the Excel
workbook from your program. The most common ones are the following:
HDR: It represents Header of the fields in the Excel table. Default is YES. If you don't have fieldnames in the header of your
worksheet, you can specify HDR=NO which will take the columns of the
tables that it finds as f1,f2 etc.
ReadOnly: You can also open Excel workbook in readonly mode by specifying ReadOnly=true; By default, Readonly attribute is false, so
you can modify data within your workbook.
FirstRowHasNames: It is the same as HDR, it is always set to 1 ( which means true) you can specify it as false if you don't have your
header row. If HDR is YES, provider disregards this property. You can
change the default behaviour of your environment by changing the
Registry Value
[HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\FirstRowHasNames] to 00
(which is false)
MaxScanRows: Excel does not provide the detailed schema defination of the tables it finds. It need to scan the rows before
deciding the data types of the fields. MaxScanRows specifies the
number of cells to be scanned before deciding the data type of the
column. By default, the value of this is 8. You can specify any value
from 1 - 16 for 1 to 16 rows. You can also make the value to 0 so that
it searches all existing rows before deciding the data type. You can
change the default behaviour of this property by changing the value of
[HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows] which is
8 by default. Currently, MaxScanRows is ignored, so you need only to
depend on TypeGuessRows Registry value. Hope Microsoft fixes this
issue to its later versions.
IMEX: (A Caution) As mentioned above, Excel will have to guess a number or rows to select the most appropriate data type of the
column, a serious problem may occur if you have mixed data in one
column. Say you have data of both integer and text on a single column,
in that case, Excel will choose its data type based on majority of the
data. Thus it selects the data for the majority data type that is
selected, and returns NULL for the minority data type. If the two
types are equally mixed in the column, the provider chooses numeric
over text.
For example, in your eight (8) scanned rows, if the column contains five (5) numeric values and three (3) text values, the
provider returns five (5) numbers and three (3) null values.
To work around this problem for data, set "IMEX=1" in the Extended Properties section of the connection string. This enforces
the ImportMixedTypes=Text registry setting. You can change the
enforcement of type by changing
[HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\ImportMixedTypes] to
numeric as well.
Thus if you look into the simple connectionstring with all of them, it will look like:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\\testexcel.xls;
Extended Properties=\"Excel 8.0;HDR=YES;IMEX=1;MAXSCANROWS=15;READONLY=FALSE\""
or:
Copy Code
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\\testexcel.xlsx;
Extended Properties=\"Excel 12.0;HDR=YES;IMEX=1;MAXSCANROWS=15;READONLY=FALSE\""
We need to place extended properties into Quotes(") as there are multiple number of values.