0

I have got a situations where I receive csv file from third party and one of the column contains the line feeds in it. Thus causing issues with the uploading of csv file to database.

I am using SSIS package to parse csv file. Now I want to correct the csv file before it get's used by SSIS.

I'll prefer a powershell script for that.

this question handling a CSV with line feed characters in a column in powershell is very related to what I am looking with one difference..

My data is in this format (Notice [LF] in the end instead of [CR][LF]

Column1,Column2,Column3,Column4
Text1,"Text2[LF]","text3[LF]",text4[LF]

Edit

Data Example

Column1, Column2, Column 3[LF]
1, "text text", text[LF]
2, "text[LF]
Some more text [LF]
Some more text", text[LF]
3, "text again", text[LF]

Here's the actual file https://www.dropbox.com/s/wsxfyehlnls7m53/test.csv

Can this actually be corrected, or it's just impossible?

Community
  • 1
  • 1
Ankit
  • 1,867
  • 2
  • 21
  • 40
  • I deleted my answer once I realized what the data actually looks like. You need to get your third party to send un-corrupted data, because what you are receiving is unacceptable. – Eric Hauenstein May 22 '14 at 13:15
  • I can't get to dropbox from this network (corp firewall). Do any of the quoted fields contain embedded commas? – mjolinor May 22 '14 at 13:35
  • Yes @mjolinor. The very same column which contains line feeds contains comma also.. – Ankit May 22 '14 at 14:16

2 Answers2

2

Try this:

(Get-content $file -Raw) -replace '\n(?=")','<br/>' |
 set-content $file

That should replace any newline that is immediately followed by a double quote.

Or, you can do it this way:

(Get-content $file -Raw) -replace '\n"','<br/>"' |
 set-content $file
mjolinor
  • 66,130
  • 7
  • 114
  • 135
  • Thanks @mjolinor but problem is there is no carriage return anywhere in the file, it's Line Feed everywhere. In column contents as well as at the end of the line.. However there is only one column which contains the line feeds and that is within "" so I was trying to have condition based on that but failing till now. – Ankit May 22 '14 at 12:44
  • Okay, then let's re-factor and try replacing any newline that is immediately followed by a double quote. – mjolinor May 22 '14 at 12:49
  • I feel I haven't cleared enough my problem.. I apologize for that. Will this answer work with the data example I put across in the question.. – Ankit May 22 '14 at 13:03
  • I believe it should. I don't have your data to test with, and would have to try to recreate it. Do you have a file in this format you can make a copy of and test it? – mjolinor May 22 '14 at 13:07
  • I just saw the update to the post. That's very different that what was originally presented. I can't say this solution will work with that. – mjolinor May 22 '14 at 13:09
0

This is a little ugly, but it works for me and does what you need.

First, get the file contents and the column headers.

$text = Import-CSV $file
$columns = Get-Content $file -TotalCount 1
$columns = $columns.Split(",").Trim()

Next loop through each field and replace the line feed with the
.

For ($r=0; $r -lt $text.Count; $r++) {
    For ($c=0; $c -lt $columns.Count; $c++) {
        $text[$r].($columns[$c]) = $text[$r].($columns[$c]).Replace("`n","<br/>")
    }
}

Then Export the CSV

$text | Export-Csv $file -NoTypeInformation
Tim Ferrill
  • 1,648
  • 1
  • 12
  • 15