4

First, apologies for the potentially duplicate question. I'm new to bash scripting and I can't even figure out some keywords to search with. With that said, I tried to simplify problem description as much as I can:

I have a text file (test.txt) that contains only this line:

REPLACE

I ran the following command which is supposed to replace file's text (i.e REPLACE) with code variable value if (A & B).

code="if (A & B)" ; awk -v var="${code}" '{ gsub(/REPLACE/, var); print }' test.txt

Expected output I expect code variable value to be printed as is:

if (A & B)

Actual output somehow the ampersand is expanded into 'REPLACE', which is gsub regexp parameter:

if (A REPLACE B)

Perhaps I need to escape the ampersand but unfortunately, code variable population is out of my control, so I can't manipulate its value manually.

FYI awk version is "GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)"

Thanks!

Remon
  • 43
  • 3
  • 2
    `&` means "substitute full match" in awk regexes, including gsub. You need to escape it: `code="if (A \\& B)"` (yes, with two baclslashes, because the shell does its expansion too) – grochmal Apr 02 '17 at 18:35

2 Answers2

6

& is a backreference metacharacter in many tools and it means "the string that matched the regexp you searched for". If you're trying to use literal strings then use literal strings instead of regexps and backreferences.

e.g.:

code="if (A & B)"
awk -v old="REPLACE" -v new="$code" 's=index($0,old){$0=substr($0,1,s-1) new substr($0,s+length(old))} 1' test.txt

The alternative, trying to santize regexps and replacements, is complicated and error prone and generally is not for the faint of heart, see: Is it possible to escape regex metacharacters reliably with sed

Community
  • 1
  • 1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    Hi Ed, this code helped me a lot, I was having troubles implementing a template engine in awk using gsub, because the ampersand. I wish there was a way to disable it. But your solution worked like a dream. Thanks! – valrog Sep 26 '17 at 16:48
  • 1
    You're welcome. Yeah I've often wished for a literal string equivalent for [g]sub() BUT a major point of the awk language is to not provide briefer constructs for operations that can be done easily in other ways since that way lies the demons of code bloat and there's already a different tool with that particular problem (see http://www.zoitz.com/archives/13). In the end I'd much rather have a small, simple language than a large complicated one even if it does cost me a few extra characters here and there. – Ed Morton Sep 26 '17 at 16:52
0

You can just double escape the '&' character so your code would be

code="if (A \\\& B)" ; awk -v var="${code}" '{ gsub(/REPLACE/, var); print }' test.txt ​

Output:
# code="if (A \\\& B)" ; awk -v var="${code}" '{ gsub(/REPLACE/, var); print }' test.txt
if (A & B)
#

Note that in the above example you'll need to escape both the '\' and '&' characters which is why it's '\\\&'

If you didn't want to need to manipulate your input strings manually like the above example, then you could use an additional 'gsub' in your awk code to preprocess the input string to add the escape characters before running your 'gsub') as follows

code="if (A & B)" ; awk -v var="${code}" '{ gsub("&","\\\\&", var); gsub(/REPLACE/, var); print }' test.txt

Output:
​​# code="if (A & B)" ; awk -v var="${code}" '{ gsub("&","\\\\&", var); gsub(/REPLACE/, var); print }' test.txt
​if (A & B)
​#

Note the need for 4 '\' characters in the preprocessing gsub.