How to use a regex for the field separator in AWK?

Question

I read another answer that show how one can set the field separator using the -F flag:

awk -F 'INFORMATION DATA ' '{print $2}' t

Now I'm curious how I can use a regex for the field separator. My attempt can be seen below:

$ echo "1 2 foo\n2 3 bar\n42 2 baz"
1 2 foo
2 3 bar
42 2 baz
$ echo "1 2 foo\n2 3 bar\n42 2 baz" | awk -F '\d+ \d+ ' '{ print $2 }'
# 3 blank lines

I was expecting to get the following output:

foo
bar
baz

This is because my regex \d+ \d+ matches "the first 2 numbers separated by a space, followed by a space". But I'm printing the second record. As shown on rubular:

How do I use a regex as the awk field separator?

awk dow not support the perlish `\d` metacharacter. You would use the POSIX character class of `[[:digit:]]` instead of `\d`. https://www.gnu.org/software/gawk/manual/html_node/GNU-Regexp-Operators.html — dawg, Mar 28 '17 at 23:37

vallentin · Accepted Answer · 2017-03-28T23:41:56.647

5

First of all echo doesn't auto escape and outputs a literal \n. So you'll need to add -e to enable escapes. Second of all awk doesn't support \d so you have to use [0-9] or [[:digit:]].

echo -e "1 2 foo\n2 3 bar\n42 2 baz" | awk -F '[0-9]+ [0-9]+ ' '{ print $2 }'

or

echo -e "1 2 foo\n2 3 bar\n42 2 baz" | awk -F '[[:digit:]]+ [[:digit:]]+ ' '{ print $2 }'

Both outputs:

foo
bar
baz

edited Mar 28 '17 at 23:41

answered Mar 28 '17 at 23:34

vallentin

23,478
6
59
81

4 seconds difference! – George Vasiliou Mar 28 '17 at 23:36
Damn! Even ended up looking quite the same! :) – vallentin Mar 28 '17 at 23:39
2

FYI some versions of echo DO print a literal newline given `\n`. That's one of the reasons it's best to avoid using echo - it's behavior is non-portable. Use `printf` instead. – Ed Morton Mar 29 '17 at 13:38

score 3 · Answer 2 · answered Mar 28 '17 at 23:33

Just replace \d with [0-9]:

With this you can print all the fields and you can see the fields immediatelly:

$ echo -e "1 2 foo\n2 3 bar\n42 2 baz" |awk -v FS="[0-9]+ [0-9]+" '{for (k=1;k<=NF;k++) print k,$k}'
1 
2  foo
1 
2  bar
1 
2  baz

So just use [0-9] in your command:

$ echo -e "1 2 foo\n2 3 bar\n42 2 baz" |awk -v FS="[0-9]+ [0-9]+" '{print $2}'
 foo
 bar
 baz

How to use a regex for the field separator in AWK?

2 Answers2