1

I have a CSV file on Unix and I am trying to load it thru DataStage. However, the process is failing because there are embedded newline characters in some rows which are causing those records to break into multiple lines..

For instance, the following line (in the file) is creating issues:

1,PRV1,id1,"This is 

a test to
check newlines"
2,PRV2,id2,"This line is OK"
3,PRV3,id3,"This is 
another example"

Here for the 1st record the string "This is a test to check newlines" is broken into 4 lines because of embedded newlines. This needs to be a single line. Please note that the double-quotes need to be retained. However, remaining lines which do NOT have embedded newlines should be left as-is.

Hence, the desired output should be:

1,PRV1,id2,"This is a test to check newlines"
2,PRV2,id2,"This line is OK"
3,PRV3,id3,"This is another example"
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Matthew
  • 315
  • 3
  • 5
  • 16
  • 4
    See https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk and https://stackoverflow.com/questions/50749026/how-can-i-replace-line-feed-in-csv-quoted-fields-with-a-blank and https://stackoverflow.com/questions/29150640/how-to-remove-new-lines-within-double-quotes and so on – Sundeep Oct 21 '20 at 06:31
  • I'd use something with an actual CSV parser instead of awk. (Self promotion: I've written an awk like tool that uses tcl for the scripting language that has just that if you're interested. – Shawn Oct 21 '20 at 06:54
  • @Sandeep is there any solution for the newline character issues that I had mentioned in my question ? – Matthew Oct 21 '20 at 07:36

0 Answers0