I have a text file with comma seperated values which has newline characters in the column values. So it makes the column data split to next line causing data issues.
Sample data
"604","56-1203802","xx","VEN","null","50","1","20","N�
jTï"
"5526","841328305","yyINC","VEN","null","50","1","20","~R¿½K�ï
¿½ï¿½}("
"604","561203802","C","VEN",,"null","50","1","20","2ï½a��"
Expected Output
"604","56-1203802","xx","VEN","null","50","1","20","N�jTï"
"5526","841328305","yyINC","VEN","null","50","1","20","~R¿½K���}("
"604","561203802","C","VEN",,"null","50","1","20","2ï½a��"
I need to remove the newlines inside double-quoted strings.
I tried the below awk command to remove it, but it is not working as expected.
gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' infile.txt > outfile.txt
The required result would be to remove the LF and CR characters from the data.
I tried solutions for similar question posted, but not working for me.
Newline characters in the file are not visible unless copied to Notepad++ when it shows as CR LF.