1

Hi im editing my question here, the requirement has slightly changed wherein the CSV file has only LF to begin with . However the CSV file could also have LF between the element within double quotes. We want to retain the LF's within double quotes and replace the LF at the end of the line with CRLF. so if my source file looks like this :

enter code here

Date,Open,High,Low,Close,comments,Remark
5-Dec-16,8088.75,8141.9,8056.85,8128.75,"TEST1 <LF>
TEST2 <LF>
with NL",remark<LF>
6-Dec-16,8153.15,8178.7,8130.85,8143.15,AAAA,remark<LF>
7-Dec-16,8168.4,8190.45,8077.5,8102.05,BBBB,remark<LF>
8-Dec-16,8152.1,8256.25,8151.75,8246.85,"TEST1<LF>
TEST with NL",remark<LF>
9-Dec-16,8271.7,8274.95,8241.95,8261.75,CCCC,remark<LF>

Date,Open,High,Low,Close,comments,Remark
5-Dec-16,8088.75,8141.9,8056.85,8128.75,"TEST1 <LF>
TEST2 <LF>
with NL",remark<CRLF>
6-Dec-16,8153.15,8178.7,8130.85,8143.15,AAAA,remark<CRLF>
7-Dec-16,8168.4,8190.45,8077.5,8102.05,BBBB,remark<CRLF>
8-Dec-16,8152.1,8256.25,8151.75,8246.85,"TEST1<LF>
TEST2 with NL",remark<CRLF>
9-Dec-16,8271.7,8274.95,8241.95,8261.75,CCCC,remark<CRLF>

enter code here

Appreciate your help.

Thanks, Chandan

chandan T
  • 11
  • 4
  • Please use code tags for samples in your post. – RavinderSingh13 Apr 03 '18 at 15:17
  • Are you trying to remove a Carriage Return character (`\r`) or a Line Feed character (`\n`) or a newline string (`\n` on UNIX or `\r\n` on Windows)? If it's just one of the characters do you want it removed across the whole line or only within specific field(s)? Do your lines end in `\r\n` but fields can contain `\n` or `\r`? Please show where each of those appears in your input, where you want it in your output and what you've tried so far and use the editors `{}` button to format the input, output, and code. Also, see https://stackoverflow.com/q/45420535/1745001. – Ed Morton Apr 03 '18 at 15:18
  • Its complicated, the source has \n within each element, and has a /n at the end of the line. We want to retain the /n at element and replace the /n with /r/n at the end of the line The csV file has some time one \n and sometimes multiple \n within " ", these /n needs to be retained.first line below has 2 \n – chandan T Apr 05 '18 at 12:16
  • Document ID,Created Date,Requester,PO Created Date,Last Updated Date,Response Codes,Response Messages,Resolved 527612,03/15/18,Jin LI,03-15-2018,03/15/18,"Success Info Info","IDOC 0000000049823820 IPaaS: JobID: eab75159c2f5",No 527615,03/15/18,Cuong Bui Manh,03-15-2018,03/15/18,"Success Info","IDOC System IPaaS: 369acd6",No – chandan T Apr 05 '18 at 12:16
  • `/n` is a string of 2 characters, a forward slash character (`/`) followed by the letter `n`.Your lines do not end with that or it'd be visible in your posted example. I suspect your lines actually end with either `\n` or `\r\n` (hopefully the latter as that makes removing standalone `\n`s mid-field easier) like every other CSV I've ever heard of - check using `od -c file` and make sure your question includes the output of that since that WILL provide the correct information. You can't show formatted text in a comment so don't try - put all relevant information in your question. – Ed Morton Apr 05 '18 at 13:45
  • updated the question accurantely. I still have problem pasting LF and CRLF, so I have put them within < > – chandan T Apr 06 '18 at 00:47

3 Answers3

1

Best to use a proper CSV parser that can handle newlines in quoted fields. Perl has one:

perl -MText::CSV -e '
    $csv = Text::CSV->new({ binary => 1 }); 
    while ($row = $csv->getline(STDIN)) {
        $row = [map {s/\n+/ /g; $_} @$row]; 
        $csv->say(STDOUT, $row)
    }
' < file.csv

or ruby

ruby -rcsv -e '
  CSV.parse( readlines.join "" ).each {|row|
    puts CSV.generate_line( row.collect {|elem| elem.gsub /\n+/, " "} )
  }
' file
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Its complicated, the source has \n within each element, and has a /n at the end of the line. We want to retain the /n at element and replace the /n with /r/n at the end of the line The csV file has sometimes one \n and sometimes multiple \n within " ", these /n needs to be retained.first line below has 2 \n – chandan T Apr 05 '18 at 12:22
  • Document ID,Created Date,Requester,PO Created Date,Last Updated Date,Response Codes,Response Messages,Resolved 527612,03/15/18,Jin LI,03-15-2018,03/15/18,"Success Info Info","IDOC 0000000049823820 IPaaS: JobID: eab75159c2f5",No 527615,03/15/18,Cuong Bui Manh,03-15-2018,03/15/18,"Success Info","IDOC System IPaaS: 369acd6",No – chandan T Apr 05 '18 at 12:22
0

Chances are you're looking for:

awk -v RS='\r\n' '{gsub(/[\r\n]+/," ")}1' file

but without details on where the \rs and \ns appear in your input that's just a guess. The above uses GNU awk for multi-char RS and in addition to replacing chains of carriage returns and/or linefeeds from inside every field with blanks will convert your newlines from \r\n (Windows style) to just \n (UNIX style) to make it easier to do anything else with them from that point onwards.

See also What's the most robust way to efficiently parse CSV using awk? for how to handle CSVs in general using awk.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Its complicated, the source has \n within each element, and has a /n at the end of the line. We want to retain the /n at element and replace the /n with /r/n at the end of the line The csV file has sometimes one \n and sometimes multiple \n within " ", these /n needs to be retained.first line below has 2 \n – chandan T Apr 05 '18 at 12:22
  • Document ID,Created Date,Requester,PO Created Date,Last Updated Date,Response Codes,Response Messages,Resolved 527612,03/15/18,Jin LI,03-15-2018,03/15/18,"Success Info Info","IDOC 0000000049823820 IPaaS: JobID: eab75159c2f5",No 527615,03/15/18,Cuong Bui Manh,03-15-2018,03/15/18,"Success Info","IDOC System IPaaS: 369acd6",No – chandan T Apr 05 '18 at 12:22
0

A little state machine in awk: uses a double quote as the field separator, and acts upon the number of fields:

awk -F '"' '
    partial {$0 = partial OFS $0; partial = ""} 
    NF % 2 == 0 {partial = $0; next} 
    {print}
' file
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Its complicated, the source has \n within each element, and has a /n at the end of the line. We want to retain the /n at element and replace the /n with /r/n at the end of the line The csV file has sometimes one \n and sometimes multiple \n within " ", these /n needs to be retained.first line below has 2 \n – chandan T Apr 05 '18 at 12:22
  • Document ID,Created Date,Requester,PO Created Date,Last Updated Date,Response Codes,Response Messages,Resolved 527612,03/15/18,Jin LI,03-15-2018,03/15/18,"Success Info Info","IDOC 0000000049823820 IPaaS: JobID: eab75159c2f5",No 527615,03/15/18,Cuong Bui Manh,03-15-2018,03/15/18,"Success Info","IDOC System IPaaS: 369acd6",No – chandan T Apr 05 '18 at 12:23