-1

I know this question is already answered but with comma as a separator. How to make awk ignore the field delimiter inside double quotes?

But My file is separated by pipe, when I use this in regex it act as a regex only and not getting proper output. I do not use awk extensively.. my requirement is add single slash before pipe character if it is coming in value.

As file size is almost 5GB, thought to select particular column and escaped the pipe.

INPUT:

"first | last | name" |" steve | white | black"| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019

Expected Output:

"first \| last \| name" |" steve \| white \| black "| exp | 12
school |" home \| school "| year | 2016
company |" private ltd "| joining | 2019

I tried to use gawk with gsub but no luck.. is there any alternate approach for the same?

Also if I have to check in multiple columns how I can do that?

Kalpesh
  • 694
  • 2
  • 8
  • 28
  • Can the described case occur several times per line. – Cyrus Jul 29 '22 at 21:28
  • please update the question to show your `gawk/gsub` attempt as well as the (wrong) output generated by your code – markp-fuso Jul 29 '22 at 21:58
  • @Cyrus, yes it can occur – Kalpesh Jul 29 '22 at 22:42
  • Instead of changing the `|`s inside quotes to `\|`s, can't you change them to some other character or string that can't be present in your input (e.g. some control character) so you don't still have the problem of having to handle `|`s inside quotes after this script is done? – Ed Morton Jul 30 '22 at 10:40
  • Can you have [escaped] `"`s inside your quoted fields and, if so, are they escaped as `""` or `\"`? Can you have newlines inside your quoted fields? – Ed Morton Jul 30 '22 at 10:41

1 Answers1

2

Assumptions:

  • can have more than one field with embedded | character (said field will be wrapped in double quotes)
  • there may be more than one embedded | character in a single field
  • double quotes do not show up as embedded characters within other double quotes

Setup:

$ cat pipe.dat
name |" steve | white "| exp | 12
school |" home | school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe | one"|"pipe | two and | three"| 2022        # multiple double-quoted fields, multiple pipes between double quotes
cars | camaro | chevy | 2033                             # no double quotes

NOTE: comments added here to highlight new cases

One awk idea:

awk '
BEGIN { FS=OFS="\"" }              # define field delimiters as double quote
      { for (i=2;i<=NF;i+=2)       # double quoted data resides in the even numbered fields
            gsub(/\|/,"\\|",$i)    # escape all pipe characters in field #i
        print
      }
' pipe.dat

This generates:

name |" steve \| white "| exp | 12
school |" home \| school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe \| one"|"pipe \| two and \| three"| 2022
cars | camaro | chevy | 2033

Assuming no spaces between the | delimiter and double quotes ...

One GNU awk idea (using the FPAT feature):

awk -v FPAT='([^|]*)|("[^"]+")' '
BEGIN { OFS="|" }
      { for (i=1;i<=NF;i++)
            gsub(/\|/,"\\|",$i)
        print
      }
' pipe.dat

This also generates:

name |" steve \| white "| exp | 12
school |" home \| school "| year | 2016
company |" private ltd "| joining | 2019
food |"pipe \| one"|"pipe \| two and \| three"| 2022
cars | camaro | chevy | 2033
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • your awk idea work .. but gawk/FPAT doesn't .. Thank you so much.. I was struggling from long time – Kalpesh Jul 29 '22 at 23:23
  • not sure what the issue is with the 2nd `awk` script ... works with the data I've used, also works with the latest input from your question ... ??? – markp-fuso Jul 29 '22 at 23:25