I have around 300 hundred Excel files with sales data, but different schema (one has a column named "Product Name", another has only "Product") but contains the same information about sales from different shops. The files are generated manually by some people so typos are also possible. Is there any nice way to import this data or I have to create 300 ETL packages in SSIS?
-
so, is the number of columns and the order of columns will be the same ? – Jayasurya Satheesh Nov 30 '17 at 08:40
-
strategically I would 1) scan all files and extract all column names into a table 2) build a dictionary for source col name <=> target (=schema) col name 3) import all files using col name translation table – MikeD Nov 30 '17 at 08:53
-
@JayasuryaSatheesh Nope. the problem is files can have totally different schemas, one shop can send sales and purchases in one file, different sheets and another one can send in 2 files. – Mikołaj Klimas Nov 30 '17 at 09:08
-
1Mikolaj Take a look at my answer on this topic https://stackoverflow.com/questions/47437513/issue-while-importing-the-excel-file-through-ssis-package/47491497#47491497 – Hadi Nov 30 '17 at 18:58
-
Personally, I would have a junior person put the columns in order for each of the 300 spreadsheets and create one load package. – KeithL Dec 01 '17 at 14:30
-
Bottom line SSIS isn't really the tool for an operation like this. You can create custom scripts and components to accomplish it but that is basically the same as creating a custom application to deal with it. SSIS is a hard typed solution meaning it wants to know how to map things and you have to tell it how to do so! there are ways of tricking it but by the time you get done a console program would have been faster and easier. I don't know BIML well but it proposes to be a solution to script packages and might be a middle ground – Matt Dec 01 '17 at 23:56
-
1@KeithL I'm that junior person :D – Mikołaj Klimas Dec 02 '17 at 13:24
1 Answers
You can do this in these steps
- Get all excel files
- For each file
Script task to get column names and sheet names
Store column names and sheet names in ssis variables
EDIT: You can't easily do anything about the typos, the easiest thing I can do out of the box is to have a dictionary table ready to go of expected values and a fuzzy match transformation to check against the dictionary table
For each sheet
SQL task to create stage table if not exists
Script task to read from excel sheet and insert into table
And it is easy to find how to read the column names of an excel file dynamically with c#. I've done something similar with vb but below is an example of how to do with c#.
Sheet names
Column names
Also this guy is loading all the files in dynamically with out of the box ssis excel dataflow
To create the table you will need to create the SQL statement to create table and then the SQL statement to insert into the table you created.

- 36,233
- 13
- 65
- 124

- 44
- 4