-2

I have a text file with comma seperated values which has newline characters in the column values. So it makes the column data split to next line causing data issues.

Sample data

"604","56-1203802","xx","VEN","null","50","1","20","N�
jTï"
"5526","841328305","yyINC","VEN","null","50","1","20","~R¿½K�ï
¿½ï¿½}("
"604","561203802","C","VEN",,"null","50","1","20","2ï½a��"

Expected Output

"604","56-1203802","xx","VEN","null","50","1","20","N�jTï"
"5526","841328305","yyINC","VEN","null","50","1","20","~R¿½K���}("
"604","561203802","C","VEN",,"null","50","1","20","2ï½a��"

I need to remove the newlines inside double-quoted strings.

I tried the below awk command to remove it, but it is not working as expected.

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' infile.txt > outfile.txt

The required result would be to remove the LF and CR characters from the data.

I tried solutions for similar question posted, but not working for me.

Newline characters in the file are not visible unless copied to Notepad++ when it shows as CR LF.

Borodin
  • 126,100
  • 9
  • 70
  • 144
lfc_07
  • 37
  • 5

1 Answers1

0

You can try this sed:

sed ':loop; /" *$/!{N;s/\n//g; b loop}' file
sat
  • 14,589
  • 7
  • 46
  • 65