0

We would like to explore again to import using SQL SSIS 2005 where the CSV file have a different number of column separated for each rows. ie.

Let say first 3 rows:

"1", "FirstName1", "LastName1"
"2", "FirstName2", "LastName2"
"3", "FirstName3", "LastName3"

4-5 rows:

"210", "Crows st", "Melbourne", "VIC"
"212", "Cr st", "Melbourne", "VIC"

others:

"100", "Activities 1"
"101", "Activities 2"

The only approaches that we did is that if the data has less than 1 million line, we can import into Excel and then export back as CSV or tab delimited and import to SQL Server. Alternatively, if it's more than 1 million records we can import into Ms Access 2007/2010 and import back into SQL (Assume that the data is no longer than 2GB). These type of solutions work nicely!!!

Is there any solution directly to SQL? We are still using Ms SQL Server 2005 BTW. The reason that we asked due to there is a limitation on the size of Ms Access which is 2GB and we have data is more than 2GB !!! Surely, we can split this file based on number of lines but we found very messy.

Looking forward for the input.

Thanks

dcpartners
  • 5,176
  • 13
  • 50
  • 73
  • Is there a relationship that must be maintained between rows 1-3, 4-5 and 6 on? – billinkc Feb 14 '13 at 18:40
  • The data is related. Thinks like this the data that we have from vendor are like that - they basically dump few tables into 1 file. We don't have access to actual source - so we have to work that combined data. – dcpartners Feb 14 '13 at 18:43
  • It's related but do we need to tie row 1 to the "Crows St" line and if so, is that leading number what we can use or is this going to be an ordinal position thing wherein Row 1 of section A ties to Row 1 of section B etc? – billinkc Feb 14 '13 at 18:54
  • Why don't you write a simple C# command-line application that will parse the file and build the appropriate statements based on how many columns are there? I assume this data is going into different tables? – Aaron Bertrand Feb 14 '13 at 18:57
  • Actually, the purpose of this exercise is to dump this file into 1 table only. Then we will have analyse the data manually - so the relation of data is not concern. The analyst might query the data based on line x to y means this etc etc - they have to flag themselves. – dcpartners Feb 14 '13 at 19:00
  • @AaronBertrand : We prefer to use SSIS cause can handle millions of records. – dcpartners Feb 14 '13 at 19:08
  • @Siva: Looks good. Is this SQL Server 2005 SSIS by any chance? Cause as I said the purpose is to dump into 1 table file - that's our purpose initially. – dcpartners Feb 14 '13 at 19:14
  • @Siva: This flat file that you show on your solution are containing different number of delimited. Correct? Ie. first line is 9, then 3 then 3 and so on – dcpartners Feb 14 '13 at 19:20
  • So your trying to import everything in 1 table? Not sure if this is helping you database integrity.. but all three files have different columns right? How are they related to each other? Is row 1, data 1 related to row 1, data 2 ? –  Apr 24 '13 at 17:10

2 Answers2

0

Since the three sets of data in the file seem to be three different entities (customer, address, activity), I assume you'll want to put them into three different tables at the end of your process. That means that splitting the data will happen anyway, it's only a question of where you do it in the loading process.

My approach would be a small script (Python, Perl, PowerShell, whatever) that splits the file into three separate files. You can call the script from the SSIS package to keep all your workflow together in one place. After that you just add three connectors to your package and load the data into whatever tables you want.

Pondlife
  • 15,992
  • 6
  • 37
  • 51
  • Our task is just to dump is into 1 table in SQL. Then the analyst will do the funky stuff with the data - which is not our concern. But we need to know alternative way cause I said SQL SSIS 2005 can't import this directly as far as we know. – dcpartners Feb 14 '13 at 19:04
  • Even in that case I would still split the file, it's easier to load three consistent files than one 'mixed' one (at least using SSIS). But there's more than one way to do things, so someone else might have an alternative that you like better. – Pondlife Feb 14 '13 at 19:19
0

You can do it, but it's complicated and the best I'm going to be able to do is point you in the right direction. It involves using a script component of type source and a conditional split that points to each of the different destinations. The script component will read the data and create the source rows. While it is doing that have it create an extra column called "RowType" that will be used by the conditional split object.

Hope that helps and good luck!

Kenneth Fisher
  • 3,692
  • 19
  • 21