6

Need a function to escape a string containing regex expression operators in an awk script.

I came across this 'ugly' solution:

function escape_string( str )
{
    gsub( /\\/, "\\\\",  str );
    gsub( /\./, "\\.", str );
    gsub( /\^/, "\\^", str );
    gsub( /\$/, "\\$", str );
    gsub( /\*/, "\\*", str );
    gsub( /\+/, "\\+", str );
    gsub( /\?/, "\\?", str );
    gsub( /\(/, "\\(", str );
    gsub( /\)/, "\\)", str );
    gsub( /\[/, "\\[", str );
    gsub( /\]/, "\\]", str );
    gsub( /\{/, "\\{", str );
    gsub( /\}/, "\\}", str );
    gsub( /\|/, "\\|", str );

    return str;
}

Any better ideas?

Lacobus
  • 1,590
  • 12
  • 20
  • 1
    Yes but why? When people try to escape regexp metacharacters it's almost always because they **really** want to do something with strings instead of regexps but don't know how to do string operations so they misguidedly try to escape all the RE metacharacters so they can use them as strings in regexp operations (e.g. `match($0,regexp)` ) instead of using them as-is in string operations (e.g. `index($0,string)`). – Ed Morton May 04 '16 at 22:58
  • @EdMorton yes, **almost** always, not always. The purpose here is to proccess text files containing two columns like this one: http://pastebin.com/U9Sjq53W - So, I wrote the following `awk` script: http://pastebin.com/AwHmHS74 to process such files. I'm searching for the string `recording made when T.M.A-1 greeted` - http://pastebin.com/sMDQxfcE - in this case, simple string operations cannot solve the problem. – Lacobus May 09 '16 at 15:45

1 Answers1

7

You can just use single gsub using a character class like this:

function escape_string( str ) {
   gsub(/[\\.^$(){}\[\]|*+?]/, "\\\\&", str)
   return str
}

& is back-reference to the matched string and \\\\ is for escaping the match.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 4
    I think you can avoid escaping `[` inside the character class, and if you list `]` first, it doesn't need escaping either: `gsub(/[][\\.^$(){}|*+?]/, …`. Whether that's actually clearer is a separate discussion. – Jonathan Leffler May 04 '16 at 23:32
  • Yes I am aware that by placing them at 1st and 2nd positions we can avoid escaping. I just avoided it because it is confusing to some as it appears that 2 different character classes are being used :) – anubhava May 05 '16 at 04:41