1

I have around 300 hundred Excel files with sales data, but different schema (one has a column named "Product Name", another has only "Product") but contains the same information about sales from different shops. The files are generated manually by some people so typos are also possible. Is there any nice way to import this data or I have to create 300 ETL packages in SSIS?

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • so, is the number of columns and the order of columns will be the same ? – Jayasurya Satheesh Nov 30 '17 at 08:40
  • strategically I would 1) scan all files and extract all column names into a table 2) build a dictionary for source col name <=> target (=schema) col name 3) import all files using col name translation table – MikeD Nov 30 '17 at 08:53
  • @JayasuryaSatheesh Nope. the problem is files can have totally different schemas, one shop can send sales and purchases in one file, different sheets and another one can send in 2 files. – Mikołaj Klimas Nov 30 '17 at 09:08
  • 1
    Mikolaj Take a look at my answer on this topic https://stackoverflow.com/questions/47437513/issue-while-importing-the-excel-file-through-ssis-package/47491497#47491497 – Hadi Nov 30 '17 at 18:58
  • Personally, I would have a junior person put the columns in order for each of the 300 spreadsheets and create one load package. – KeithL Dec 01 '17 at 14:30
  • Bottom line SSIS isn't really the tool for an operation like this. You can create custom scripts and components to accomplish it but that is basically the same as creating a custom application to deal with it. SSIS is a hard typed solution meaning it wants to know how to map things and you have to tell it how to do so! there are ways of tricking it but by the time you get done a console program would have been faster and easier. I don't know BIML well but it proposes to be a solution to script packages and might be a middle ground – Matt Dec 01 '17 at 23:56
  • 1
    @KeithL I'm that junior person :D – Mikołaj Klimas Dec 02 '17 at 13:24

1 Answers1

0

You can do this in these steps

  • Get all excel files
  • For each file
  • Script task to get column names and sheet names

  • Store column names and sheet names in ssis variables

EDIT: You can't easily do anything about the typos, the easiest thing I can do out of the box is to have a dictionary table ready to go of expected values and a fuzzy match transformation to check against the dictionary table

  • For each sheet

  • SQL task to create stage table if not exists

  • Script task to read from excel sheet and insert into table

And it is easy to find how to read the column names of an excel file dynamically with c#. I've done something similar with vb but below is an example of how to do with c#.

Sheet names

Column names

Also this guy is loading all the files in dynamically with out of the box ssis excel dataflow

To create the table you will need to create the SQL statement to create table and then the SQL statement to insert into the table you created.

Hadi
  • 36,233
  • 13
  • 65
  • 124