0

The following will output "b1 as it recognizes the quoted space as a field delimiter. How do I tell awk to ignore quoted delimiters so that this would output b1 b2 or "b1 b2"

echo 'a "b1 b2" c'| awk '{print $2}'

I see the following two related posts, but I'm having trouble getting the solutions to work. I was hoping to find a simple solution. Field parsing is awk's specialty, right?

awk ignore delimiter inside single quote within a parenthesis What's the most robust way to efficiently parse CSV using awk?

Inian
  • 80,270
  • 14
  • 142
  • 161
clay
  • 18,138
  • 28
  • 107
  • 192

4 Answers4

4

With gawk (GNU awk) you can use the FPAT special variable to define how a field looks like instead of being limited to specify a delimiter:

echo 'a "b1 b2" c'| gawk '{print $2}' FPAT='("[^"]+")|[^[:blank:]]+'

Here we say: A field is either a " followed by non " chars and a closing " -> ("[^"]+") ... or | a sequence of non-blank chars -> [^[:blank:]]+

These regexes will be evaluated in order, therefore a field enclosed in "" has precedence over the second pattern, the sequence of non blank chars (awk's default).


See GNU awk manual: Defining fields by content

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • Unfortunately a field containing a double quoot (like i.e. 'a "b1""b2" c' , where $2="b1""b2") will break this, but it's a nice feature of gawk, which I never used... – Luuk May 31 '19 at 17:51
  • 1
    Yeah, it's nice. As long as the data follows at least _some_ standard, you can just change the pattern. If the data can be anything, it's impossible to parse it. – hek2mgl May 31 '19 at 18:10
  • Please fix your FPAT to include the last field `g1` in the following command `echo 'a "b1 b2" "c1 c2 c3" d "e1 e2" ftext e g1' |awk '{for(i=1;i – Dudi Boy May 31 '19 at 22:16
  • It should be `for(i=1;i<=NF;i++)` in your example. Note: `<=` instead of `<` – hek2mgl May 31 '19 at 22:42
1

awk doesn't have the simple, convenient support for quoted fields that I wanted. I also looked at cut and that didn't either.

Another widely available bash shell tool called csvcut included as part of a bundle of tools called csvkit does provide easy support for quoted fields. My data is space delimited, not comma delimited, but I can easily specify a space delimiter to the csvcut tool.

This is what I wanted:

# Gives a
echo 'a "b1 b2" c d e' | csvcut -d ' ' -c 1
# Gives b1 b2
echo 'a "b1 b2" c d e' | csvcut -d ' ' -c 2
# Gives c
echo 'a "b1 b2" c d e' | csvcut -d ' ' -c 3
clay
  • 18,138
  • 28
  • 107
  • 192
0

Shortest answer:

echo 'a "b1 b2" c'| awk -F\" '{print $2}'

will output: b1 b2

Cristián Ormazábal
  • 1,457
  • 1
  • 9
  • 18
  • Perfect! That works! I saw similar questions get blocked as duplicates, but the other SO answers I reviewed were far more complicated. This is really simple and exactly what I wanted. Thanks! – clay May 31 '19 at 17:35
  • How would I adapt this to a comma scenario like `echo 'a,"b1,b2",c'| awk -F',' '{print $2}'`? – clay May 31 '19 at 17:38
  • @clay This code assumes the middle column is always quoted and the first column is never quoted and there are no escaped quotes. – melpomene May 31 '19 at 17:38
  • @melpomene, yikes. those are unacceptable assumptions. I'm giving a simplified example of real data that won't fit that. – clay May 31 '19 at 17:39
  • @clay then you've over-simplified your example. There's dozens of ways to get the output you posted from the input you posted and almost all of them won't work for your real input. You need to provide more truly representative ample input/output if you want a robust but concise answer. – Ed Morton Jun 01 '19 at 03:50
0

You can get what you look for this way:

awk '{split($0,a,/^"|" "| "|" |"$/);j=a[1]!=""?0:1;print a[2+j]}'

I think you can get a way where it fail ...

ctac_
  • 2,413
  • 2
  • 7
  • 17