I am facing a problem importing a flat file into SSIS.
The file is seperated by "|" and has deliminater as ";;". However the deliminator is inconsistent. Sometimes, at the and of the rows, there is only ";" or nothing "". When importing to SSIS I get the result
Column 1 Column 2 Column 3 Column 4 Column 5
a b c d e;|a1|b1|c1|d1|e1
This should instead look like
Column 1 Column 2 Column 3 Column 4 Column 5
a b c d e
a1 b1 c1 d1 e1
And the problem arrises because in the first row there is only one or none ";". Note this is an example, many of the rows are correct and have ";;" as deliminator. I am only pointing out the problem.
The .csv file would look like
Column 1|Column 2|Column 3|Column 4|Column 5;;
a|b|c|d|e;
a1|b1|c1|d1|e1;;
and should instead look like
Column 1|Column 2|Column 3|Column 4|Column 5;;
a|b|c|d|e;;
a1|b1|c1|d1|e1;;
The data set is very big with almost 600.000 rows and 50 columns.
The first problem I face is when I import the file, since SSIS standard DataType reading is string [DT_STR]
. with a length of 50
. Since sometimes there are multiple rows with wrong deliminators, I get a very long strings in the last column cell. I Use Visual Studio, and in the Advanced Editor I changed the length to something very big.
Advanced editor in Visual studio were I have changed the length
So the question is, how do I in SSIS and Visual Studio Community separate the values in some cells in one column and split op these into a entire new row (with the already defined column variables).
I have tried manually to find all the cases where there is a error and changed this in the .csv file. After this SSIS works. However this is not a durable solution because I am getting a new file every month.
I have tried reading suggestions as:
Split a single column of data with comma delimiters into multiple columns in SSIS
but their problem is not he same, since they have a column value the replicate, and I want a entire new row.
Thanks for any help, ss
!! EDIT trying using the answers from J Weezy and R M: !!
I try to create a script task and follow that solution.
In Visual Studio, I add a script task using a Script Component and I choose "Transformation". Under Input Columns I choose all.
After this i direct the flat file source to the script component and run the code. Running the script like this (where the script component doesn't do anything) works.
Then I enter "Edit Script" in the script component, and under public override void Input0_ProcessInputRow(Input0Buffer Row)
I enter (using the help from R M):
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
public static string[] SplitLine(string input)
{
Regex lineSplit = new Regex("[0-9]\;$", RegexOptions.Compiled);
List<string> list = new List<string>();
string curr = null;
foreach (Match match in lineSplit.Matches(input))
{
curr = match.Value;
if (0 == curr.Length)
{
list.Add("");
}
list.Add(curr.TrimStart(';'));
}
return list.ToArray();
}
}
However this doesn't work (I am not even allowed to execute the task).
I have never worked with c# before so everything is new to me. As i understand the code, it search each line to find the pattern where there is numbers in front of only one ";" at the end, hence it will not find those lines which ends with numbers following by ";;" (two ;). When there is a match, one ";" is added.
Please let me know, what I am not understanding and doing wrong.
Maybe it is also wrong to put the script component after the flat file source, because adding ";" will not result in a new line, which is what I want.