2

I'm trying to remove delimiter | with in the quotes using sed on text which contains dates, nulls, strings with pipe delimiter. I used following sed its working fine but its removing delimiter between dates. Any help will be appreciated.

sed -E 's/(^|[^"|])\|($|[^"|])/\1 \2/g' <file>

Input:

"Southern|Palms"|"AA|None"|"4"|"Ken|Coast"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"|

Returns:

"Southern Palms"|"AA None"|"4"|"Ken Coast"|1/11/2019 00:00:00 30/4/2020 00:00:00|"TH"|

Expected Output:
"Southern Palms"|"AA None"|"4"|"Ken Coast"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"|

marjun
  • 696
  • 5
  • 17
  • 30

2 Answers2

2

With GNU awk for FPAT:

$ awk -v FPAT='[^|]*|"[^"]+"' -v OFS='|' '{for (i=1;i<=NF;i++) gsub(/\|+/," ",$i)} 1' file
"Southern Palms"|"AA None"|"4"|"Ken Coast"|1/11/2019 00:00:00|30/4/2020 00:00:00|"TH"|

See What's the most robust way to efficiently parse CSV using awk?

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • i tried above awk but its replacing space with delimiter "UK|&|I" "KEKE0006" "Southern Palms Beach|Resort" "AA|None" "4" "Kenya" "MBA" "Kenyan|Coast" "Y" 1/11/2019|00:00:00 30/4/2020|00:00:00 – marjun Jul 23 '19 at 04:27
  • It does exactly what you asked for. If you're seeing unexpected output then you either copy/pasted the script wrong or your real input doesn't look like your expected output or you aren't running gawk. What does running `awk --version` tell you? – Ed Morton Jul 23 '19 at 04:30
  • its having GNU Awk 3.1.7 – marjun Jul 23 '19 at 04:32
  • 1
    That is an **extremely** archaic version of gawk (5+ years out of date) that pre-dates FPAT. We're currently on gawk 5.0.2 - can you update your version? You're missing a TON of extremely useful functionality and some bug fixes. – Ed Morton Jul 23 '19 at 04:35
1

How about:

sed -E 's/(\w+)\|(\w+)/\1 \2/g' testfile.txt

\w+\|\w+ matches pipe symbols between two words like this <word1>|<word2> and replaces it with the two words separated by a space like this <word1> <word2>

If you want to match the quotes use:

sed -E 's/("\w+)\|(\w+")/\1 \2/g' testfile.txt

That matches "<word1>|<word2>" and replaces it with "<word1> <word2>"

ventsyv
  • 3,316
  • 3
  • 27
  • 49
  • delimiter is missing between date fields with above sed – marjun Jul 23 '19 at 04:30
  • @marjun I ran it on my Linux machine with the input you provided and I got the output that you said is expected. That's with quotes in the pattern by way. – ventsyv Jul 23 '19 at 04:41
  • its answer my quotient, what if a delimiter in between multiple words "Southern Palms|Beach Resort" – marjun Jul 23 '19 at 04:41
  • This fails if there are more than two words within the quotes like this: `"Southern|Palms|spring"|"AA|None"|"4"` – Jotne Jul 23 '19 at 05:56