3

I have 2 large files, an Excel spreadsheet and a csv file, which are messed up, but still need to be uploaded into a table. I'm in progress learning how to use SSIS. Assume the columns and rows look something like this..

1st Excel spreadsheet (file extension .xlxs)...

  ID    Name    GroupName City        Time        Price Date
  A1    South   Group1    London      10/06/2018  $4.50 13.30
  A2    North   Group2    New York                $60   10/07/2018  09:00  AM  

Fig 1

2nd Excel spreadsheet (file extension .csv)...

ID  Name    GroupName   City         Date         Time   Price  
A3  East    Group3      Paris        09/09/2017   $5.00  03:00  AM   
A4  West    Group4      Berlin                    01/05/2018    $12.50  18:00

Fig 2

If you look at ID A2 in Fig 1, you will see Date as 9.00 then AM in different column. How do you solve a problem like that? This is an example, so Time data is randomly different in each column. Also note in Fig 2 for A4

I am familiar to a degree with the Script Task and Foreach Loop Container.

I search on the net and found this website....

It's is sort of what I am looking for.

For now a table has been created with these column names

ID, Name, GroupName, City, Date, Time and Price.

So ideally when data is loaded into the table it should look like this...

  ID    Name    GroupName   City         Date         Time   Price  
  A1    South   Group1      London       10/06/2018   13.30   $4.50           
  A2    North   Group2      New York     10/07/2018   09:00AM $60 
  A3    East    Group3      Paris        09/09/2017   03:00AM $5.00  
  A4    West    Group4      Berlin       01/05/2018   18:00   $12.50

I am not sure how to approach this.

Please note: I just want to know what SSIS Toolbox Components I need to use. Once I know, I will attempt to solve this problem. That's the reason for no code example.

Thanks in advance.

Update

Thanks Hadi. If nobody mind I will keep this thread open and update when SSIS is fully available in VS 2019 and have the chance to find a solution.

rs001
  • 57
  • 7
  • `If nobody mind I will keep this thread open and update when SSIS is fully available in VS 2019`. I think a new fresh question will get attention more than an old one. I suggest closing by accepting/upvoting answer below and write a new question when `SSIS is fully available in VS 2019`. Good luck – Yahfoufi Jun 21 '19 at 12:35

1 Answers1

0

I don't think there is an easy solution for that. But i will try to give some suggestions:

  1. Convert the Excel file into csv file
  2. In the Flat file connection manager only define on column of type DT_STR and length = 4000
  3. In the Data Flow Task add a Script Component to split each line and validate each column value and assign it to the relevant output column

You can refer to the following answers to learn more since it contains helpful information on how to read data from flat file when data is not structured very well (Even if it is not the same case)

Hadi
  • 36,233
  • 13
  • 65
  • 124