1

Hi I am trying to match the following string to no avail

echo '[xxAA][xxBxx][C]' | awk -F '/\[.*\]/' '{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'

I basically want to have each field be an enclosing bracket such that

field 1 = xxAA field 2 = xxBxx field 3 = C

but i keep getting the following result

-->[xxAA][xxBxx][C]<--

any pointers where I am going wrong?

75inchpianist
  • 4,112
  • 1
  • 21
  • 39

3 Answers3

2

You can use a regex in Field Separator. We enclose the [ and ] in character class to have it considered as literal. Both are separated by | which is logical OR. Since we target them as field separator we just iterate over even field numbers to get the output.

$ echo '[xxAA][xxBxx][C]' | awk -v FS="[]]|[[]" '{ for (i=2;i<=NF;i+=2) print $i }'
xxAA
xxBxx
C
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • 1
    +1, i didn't know that awk accepts a pattern as field separator, and that square brackets don't need to be escaped in a character class. I will add awk in this post: http://stackoverflow.com/questions/17845014/what-does-the-regex-mean – Casimir et Hippolyte Sep 23 '14 at 23:18
  • 1
    I'm surprised that you can't write something like: `]|\[` or `\]|\[`. However it seems that you can write: `[][]` – Casimir et Hippolyte Sep 23 '14 at 23:29
  • 1
    Thanks @CasimiretHippolyte for both the vote and linked post. It was very informative. Yes, ideally one should write `[][]`. I just took the opportunity to suggest the use for logical `OR` operator in field separator. – jaypal singh Sep 23 '14 at 23:32
  • 1
    @CasimiretHippolyte you cant write `\[` because you're specifying a string literal and so it gets parsed twice so you need to escape it twice, `\\[`, that's all. – Ed Morton Sep 23 '14 at 23:37
  • @EdMorton: in my system (debian like), it seems that you need to use three or four backslashes to obtain an escaped square bracket between double quotes and only two or three between single quotes. The last backslash is obviously ignored. This is exactly the same behaviour with PHP pattern strings. – Casimir et Hippolyte Sep 24 '14 at 00:15
  • @CasimiretHippolyte - you need 2 between single quotes (one when awk reads the script and a second when it executes it) and 3 between double because in the latter you're adding yet ANOTHER layer of interpretation for the shell. 1 additional backslash will be ignored (with a warning in gawk) but more than that can impact interpretation. – Ed Morton Sep 24 '14 at 04:03
1

The regex /\[.*\]/ matches the entire input, because the .* matches the ][ inside the input as well as matching the letters.

You could split fields on the ']' character instead, then put it back again in the output:

echo '[xxAA][xxBxx][C]' | awk -F ']' '{ for (i = 1; i <= NF; i++) if ($i != "") printf "-->%s]<--\n", $i }'
Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
1

This is a job for GNU awk's FPAT variable which lets you specify the pattern of the fields rather than the pattern of the field separators:

$ echo '[xxAA][xxBxx][C]' | awk -v FPAT='[^][]+' '{ for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
-->xxAA<--
-->xxBxx<--
-->C<--

With other awks I'd use:

$ echo '[xxAA][xxBxx][C]' | awk -F'\\]\\[' '{ gsub(/^\[|\]$/,""); for (i = 1; i <= NF; i++) printf "-->%s<--\n", $i }'
-->xxAA<--
-->xxBxx<--
-->C<--
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    I'll be honest. Even though I have GNU `awk` 4.1, and have been using it for a while `FPAT` and `patsplit` are two most underutilized features. – jaypal singh Sep 23 '14 at 23:38
  • 1
    Yeah I haven't found a use for patsplit() myself yet but FPAT is useful, especially for CSV parsing. – Ed Morton Sep 23 '14 at 23:40
  • True, parsing quoted CSV using `FPAT` is helpful. I just go with `Text::ParseWords` core module and `perl` mainly coz work machines are still with GNU `awk` 3.x on RHEL 6 machines. – jaypal singh Sep 23 '14 at 23:44