0

I'm trying to remove the pipe delimiter from double quotes,its replacing with following code but its not handling empty filed and numeric fields. Any help would be appreciated.

Code:

 sed 's/|*\([^"]\)/\1/g;s/\([^"]\),*/\1/g' test.csv

INPUT:

"Southern|Palms"|"AA|None"|"4"|"Kenya"|"MBA"|"Ken|Coast"|"Y"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"||40.00|"HB|AI"||||||"Y"||||

OUTPUT:

"SouthernPalms"|"AANone"|"4"|"Kenya"|"MBA"|"KenCoast"|"Y"1/11/2019 00:00:0030/4/2020 00:00:00|"TH"40.00|"HBAI"|"Y"|

Expected Output:

"Southern Palms"|"AA None"|"4"|"Kenya"|"MBA"|"Ken Coast"|"Y"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"||40.00|"HB AI"||||||"Y"||||
marjun
  • 696
  • 5
  • 17
  • 30

1 Answers1

0
sed -E 's/(^|[^"|])\|($|[^"|])/\1 \2/g' <file>

Returns :

"Southern Palms"|"AA None"|"4"|"Kenya"|"MBA"|"Ken Coast"|"Y"|1/11/2019 00:00:00 30/4/2020 00:00:00|"TH"||40.00|"HB AI"||||||"Y"||||

It won't work for lines like this and I'm sure there are many other examples that could break the regexp :

"Southern|Palms"|"AA|||None"|"4"|"Kenya"|"MBA"|"Ken|Coast"|"Y"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"||40.00|"HB|AI"||||||"Y"||||

But awk/sed are not the proper tools to read csv with quotechar and escapechar. For complex files like this, I would consider using python or any CSV reader that take these options into account.

Corentin Limier
  • 4,946
  • 1
  • 13
  • 24
  • how do we escape the quotes in between dates. result : 1/11/2019 00:00:00 30/4/2020 00:00:00 expected : 1/11/2019 00:00:00 | 30/4/2020 00:00:00 – marjun Jul 23 '19 at 03:51