49

This seems like it should be dirt simple, but the awk gensub/gsub/sub behavior has always been unclear to me, and now I just can't get it to do what the documentation says it should do (and what experience with a zillion other similar tools suggests should work). Specifically, I want to access "captured groups" from a regex in the replacement string. Here's what I think the awk syntax should be:

awk '{ gsub(/a(b*)c/, "Here are bees: \1"); print; }'

That should turn "abbbc" into "Here are bees: bbb". It does not, at least not for me in Ubunutu 9.04. Instead, the "\1" is rendered as a ^A; that is, the character with code 1. Not what I want, of course. How do I do this?

Thanks.

Pointy
  • 405,095
  • 59
  • 585
  • 614

2 Answers2

48

With GNU awk:

echo abbc | awk '{ print gensub(/a(b*)c/, "Here are bees: \\1", "g", $1);}'

See manual here to see the difference between gsub and gensub

gensub() provides an additional feature that is not available in sub() or gsub(): the ability to specify components of a regexp in the replacement text. This is done by using parentheses in the regexp to mark the components and then specifying ‘\N’ in the replacement text, where N is a digit from 1 to 9.

Sridhar Sarnobat
  • 25,183
  • 12
  • 93
  • 106
  • 9
    Also, not only to gsub and gensub behave differently with respect to return value, but the whole \1 through \9 feature *only* works with gensub. – Pointy Oct 12 '09 at 16:05
  • Try `echo xxxabbcxxx` - the awk "solution" breaks – Aleksandr Levchuk Jun 23 '11 at 10:45
  • @Alesandr, feel free to propose a new one –  Jun 27 '11 at 07:29
  • 1
    @AleksandrLevchuk Your example works exactly as expected. I see nothing wrong with this solution. It makes the substitution, then returns the full variable(s). – Sparhawk Oct 28 '16 at 04:17
  • It's hilarious to me to see someone state that a language is "broken" because they didn't understand the syntax or read the manual! – Medievalist Apr 04 '17 at 22:10
31

Per the gawk manual

gensub provides an additional feature that is not available in sub or gsub: the ability to specify components of a regexp in the replacement text. This is done by using parentheses in the regexp to mark the components and then specifying ‘\N’ in the replacement text, where N is a digit from 1 to 9.

You must use gensub, you must specify "g", and you must grab the result of gensub, since it does not modify in-place.

awk '{ r = gensub(/a(b*)c/, "Here are bees: \\1", "g"); print r; }'
Jonathan Feinberg
  • 44,698
  • 7
  • 80
  • 103