Prevent Duplicate headers in flat file destination - SSIS

Question

I need some help.

I am importing some data in .csv file from an oledb source. I don't want the headers to appear twice in the destination. If i Uncheck the "Column names in first data row" property , the headers don't get populated in the first execution as well.

Output as of now.

Col1,Col2
A,B
Col1,Col2
C,D

How can I make the package run in such a way that if the file is empty , the headers get inserted. Then if the execution happens again, headers are not included,just the data.

there was a similar thread, but wasn't able to apply the solution as how to use expressions to get the number of rows of destination itself. It was long back , so I created a new.

Your help is deeply appreciated.

-Akshay

You have an existing table. You want to load all the data in your source file, columns included on the first load. On subsequent loads, do not add the column names back into the table? I'm confused on the value of storing the column names in with the actual data. — billinkc, May 05 '13 at 13:57
I am loading data from my source DB. Want to insert it into .CSV file. Assume that the package is going to run for the first time ever.If the data is there in SQL table,Create a file and insert into the data (This is done). If I run the package again, I do not want the HEADERS (COL1,COL2,COL3) be appended as data in the file, but the data. I hope this makes more sense now. — Akshay, May 05 '13 at 15:28

score 7 · Accepted Answer · answered May 07 '13 at 04:41

Perhaps I'm missing something but this works for me. I am not having the read only trouble with ColumnNamesInFirstDataRow

I created a package level variable named AddHeader, type Boolean and set it to True. I added a Flat File Connection Manager, named FFCM and configured it to use a CSV output of 2 columns HeadCount (int), AddHeader (boolean). In the properties for the Connection Manager, I added an Expression for the property 'ColumnNamesInFirstDataRow' and assigned it a value of @[User::AddHeader]

enter image description here

I added a script task to test the size of the file. It has read/write access to the Variable AddHeader. I then used this script to determine whether the file was empty. If your definition of "empty" is that it has a header row, then I'd adjust the logic in the if check to match that length.

    public void Main()
    {

        string path = Dts.Connections["FFCM"].ConnectionString;

        System.IO.FileInfo stats = null;
        try
        {
            stats = new System.IO.FileInfo(path);
            // checking length isn't bulletproof based on how the disk is configured
            // but should be good enough
            // http://stackoverflow.com/questions/3750590/get-size-of-file-on-disk
            if (stats != null && stats.Length != 0)
            {
                this.Dts.Variables["AddHeader"].Value = false;
            }
        }
        catch
        {
            // no harm, no foul
        }
        Dts.TaskResult = (int)ScriptResults.Success;
    }

I looped through twice to ensure I'd generate the append scenario

enter image description here

I deleted my file and ran the package and only had a header once.

enter image description here

Thank you for the solution. I did implement it. – Akshay May 07 '13 at 15:00 — Akshay, May 07 '13 at 15:00

score 2 · Answer 2 · answered May 05 '13 at 16:59

The property that controls whether the column names will be included in the output file or not is ColumnNamesInFirstDataRow. This is a readonly property.

One way to achieve what you are trying to do it would be to have two data flow tasks on the control flow surface preceded by a script task. these two data flow tasks will be identical except that they will be referring to two different flat file connection managers. Again, the only difference between these two would be the different values for the ColumnsInTheFirstDataRow; one true, another false.

Use this Script task to decide whether this is the first run or subsequent runs. Persist this information and check it within the script. Either you can have a separate table for this information, or use some log table to infer it.

score 2 · Answer 3 · answered May 07 '13 at 09:31

Following solution is worked for me.You can also try the following.

Create three variables.

IsHeaderRequired RowCount TargetFilePath

Get the source row counts using Execute SQL task and save it in RowCount variable.
Have script task. Add readonly variables TargetFilePath and RowCount. Add read and write variable IsHeaderRequired.

Edit the script and add the following line of code.

    string targetFilePath = Dts.Variables["TargetFilePath"].Value.ToString();
    int rowCount = (int)Dts.Variables["RowCount"].Value;

    System.IO.FileInfo targetFileInfo = new System.IO.FileInfo(targetFilePath);

    if (rowCount > 0)
    {
        if (targetFileInfo.Length == 0)
        {
            Dts.Variables["IsHeaderRequired"].Value = true;
        }
        else
        {
            Dts.Variables["IsHeaderRequired"].Value = false;
        }
    }

    Dts.TaskResult = (int)ScriptResults.Success;

Connect your script component to your database
Click connection manager of flat file[i.e your target file] and go to properties. In the expression, mention the following as shown in the screenshot.
```
Map the connectionString to variable "TargetFilePath".
Map the ColumnNamesInFirstDataRow to "IsHeaderRequired".
```

Expression for Flat file connection Manager. enter image description here

Final package[screenshot]:

enter image description here

Hope this helps

The solution works good, but i was hoping to know something which doesn't require extra task of checking the file details (script task n all). Your solution is appreciated though. — Akshay, May 07 '13 at 15:03

score 2 · Answer 4 · answered May 08 '13 at 13:44

A solution ....

First, add an SSIS integer variable in the scope of the Foreach Loop or higher - I'll call this RowCount - and make its default value negative (this is important!). Next, add a Row Count to your Data Flow, and assign the result to the RowCount SSIS variable we just made. Third, select your Connection Manager (don't double-click) and open the Properties window (F4). Find the Expressions property, select it, and hit the ellipsis (...) button. Select the ColumnNamesInFirstDataRow property, and use an expression like this:

[@User::RowCount] < 0

Now, when your package starts, RowCount has the static value of -1 or another negative number. When the data flow starts for the first time in your loop, the ColumnNamesInFirstDataRow property will have a value of TRUE. When the first data flow completes, the row count (even if it's zero) is written to the RowCount variable. On the second interation of the loop, the Connection Manager is then reconfigured to NOT write column names...

I can confirm this method works as I just used it. This method is actually better than the accepted answer as it doesn't need a script task, it can be done entirely within SSIS. — Ubercoder, Nov 22 '18 at 15:05

Prevent Duplicate headers in flat file destination - SSIS

4 Answers4