Reading sparse columns from a CSV

Question

I get a CSV that I need to read into a SQL table. Right now it's manually uploaded with a web application, but I want to move this into SQL server. Rather than port my import script straight across into a script in SSIS, I wanted to check and see if there was a better way to do it.

The issue with this particular CSV is that the first few columns are known, and have appropriate headers. However, after that group, the rest of the columns are sparsely populated and might not even have headers.

Example:

Col1,Col2,Col3,,,,,,
value1,value2,value3,,value4
value1,value2,value3,value4,value5
value1,value2,value3,,value4,value5
value1,value2,value3,,,value4

What makes this tolerable is that everything after Col3 can get concatenated together. The script checks each row for these trailing columns and puts them together into a "misc" column. It has to do this in a bit of a blind method because there is no way of knowing ahead of time how many of these columns will be out there.

Is there a way to do this with SSIS tools, or should I just port my existing import script to an SSIS script task?

If I were you, I'd stick to the import script that you have. SSIS gets really limited when it has to deal with something like this. But I will let other experts weigh in as well. — rvphx, Feb 01 '13 at 23:13
Agreed. I need to wrap the overall process in an SSIS package for other reasons. However, I can use my original script to actually make sense of the data, and pass it on to the rest of the process. That seems to be the most flexible solution. — Devin Goble, Feb 04 '13 at 17:20

score 0 · Answer 1 · answered Feb 02 '13 at 01:10

Another option outside of SSIS is using BulkInsert with format files.

Format files allow you to describe the format of the incoming data.

For example..

9.0
4
1       SQLCHAR       0       100     ","      1     Header1      SQL_Latin1_General_CP1_CI_AS
2       SQLCHAR       0       100     ","      2     Header2      SQL_Latin1_General_CP1_CI_AS
3       SQLCHAR       0       100     ","      3     Header3      SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR       0       100     "\r\n"   4     Misc         SQL_Latin1_General_CP1_CI_AS

Bulk Insert>> http://msdn.microsoft.com/en-us/library/ms188365.aspx

Format Files >> http://msdn.microsoft.com/en-us/library/ms178129.aspx

Unfortunately, the format file is too rigid to work with the files I have to work with. — Devin Goble, Feb 04 '13 at 17:13

Anoop Verma · Answer 2 · 2013-02-02T14:52:24.230

Step 0. My test file with an additional line

Col1,Col2,Col3,,,,,,
value1,value2,value3,,value4
value1,value2,value3,value4,value5
value1,value2,value3,,value4,value5
value1,value2,value3,,,value4
ends,with,comma,,,value4,

Drag a DFT on the Control flow surface
Inside the DFT, on the data flow surface, drag a Flat file source
Let is map by itself to start with. Check Column names in the first data row.
You will see Col1, Col2, Col3 which are your known fields.
You will also see Column 3 through Column 8. These are the columns that need to be lumped into one Misc column.
Go to the Advanced section of the Flat File Manager Editor.
Rename Column 3 to Misc. Set field size to 4000.

Note: For longer than that, you would need to use Text data type. That will pose some challenge, so be ready for fun ;-)
Delete Columns 4 through 8. Now add a script component. Input Columns - select only Misc field. Usage Type: ReadWrite Code:

public override void Input0_ProcessInputRow(Input0Buffer Row)

    {
        string sMisc = Row.Misc;


        string sManipulated = string.Empty;
        string temp = string.Empty;

        string[] values = sMisc.Split(',');

        foreach (string value in values)
        {
            temp = value;

            if (temp.Trim().Equals(string.Empty))
            {
                temp = "NA";
            }

            sManipulated = string.Format("{0},{1}", sManipulated, temp);
        }

        Row.Misc = sManipulated.Substring(1); 
    }

-- Destination. Nothing different from usual.

Hope I have understood your problem and the solution works for you.

Letting any SSIS tools look at the schema first isn't going to work. As you mention in your answer, I'd have to manually delete spurious columns. No, this needs to happen on the fly. — Devin Goble, Feb 04 '13 at 17:16
Col1,Col2,Col3,,,,,,,, value1,value2,value3,,value4,new value1,value2,value3,value4,value5 value1,value2,value3,,value4,value5,new value1,value2,value3,,,value4,new ends,with,comma,,,value4, — Anoop Verma, Feb 07 '13 at 01:37
I see where you're going with this, and it would work if the (a) the file was always the same, or (b) I only got it once. However, neither of those is true. — Devin Goble, Feb 07 '13 at 17:20

Reading sparse columns from a CSV

2 Answers2