2

The typical use-case is when a regex needs to include user input. Characters with special meaning in regex (i.e. "the dirty dozen" in Perl) need to be escaped. Perl provides the "quotemeta" functionality to do this: simply encapsulate interpolating variables in \Q and \E. But Tcl provides no such functionality (and according to this page, even with ARE).

Is there a good (rigorous) implementation of quotemeta in Tcl out there?

Borodin
  • 126,100
  • 9
  • 70
  • 144
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145

2 Answers2

3

Perl's quotemeta function simply replaces every non-word character (i.e., characters other than the 26 lowercase letters, the 26 uppercase letters, the 10 digits, and underscore) with a backslash. This is overkill, since not all non-word characters are regexp metacharacters, but it's simple and safe, since escaping a non-word character that doesn't need escaping is harmless.

I believe this implementation is correct:

proc quotemeta {str} {
    regsub -all -- {[^a-zA-Z0-9_]} $str {\\&} str
    return $str
}

But thanks to glenn's comment, this one is better, at least for modern versions of Tcl (\W matches any non-word character starting some time after Tcl 8.0.5):

proc quotemeta {str} {
    regsub -all -- {\W} $str {\\&} str
    return $str
}

(I'm assuming that Tcl's regular expressions are similar enough to Perl's so that this will do the same job in Tcl that it does in Perl.)

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 2
    Like Perl, Tcl has `\W` to match a non-word char -- http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm#M65 – glenn jackman Jul 12 '12 at 03:52
  • I didn't know Perl implemented it so simply! I guess I thought it possible that some escaped special characters I didn't now about might have special meaning. Thanks. – Andrew Cheong Jul 12 '12 at 14:33
  • @acheong87: Future versions of Perl could easily add more metacharacters. Keeping `quotemeta` simple means it doesn't have to change to track changes in the regexp syntax. – Keith Thompson Jul 12 '12 at 14:35
  • Agreed, thanks. And do keep the first version up, because Tcl 8.0.5 doesn't support special character classes like ``\W``. Also, I noticed that whitespace characters will be escaped, but I think that is fine and safer anyway. – Andrew Cheong Jul 12 '12 at 14:40
  • @acheong87: Thanks for the info about Tcl 8.0.5; I've added it to my answer. – Keith Thompson Jul 12 '12 at 14:46
  • 1
    While there are a lot of differences in the detail, backslashing all non-alnum characters (i.e., `\W`) is entirely adequate for the same reason it's adequate in Perl. The RE engine was indeed changed in 8.1; you're not really recommended to use 8.0 any more though, as 8.4 is entirely faster (and packed with many more goodies too). – Donal Fellows Jul 12 '12 at 14:50
  • @DonalFellows: Alas, it's not always practical to use the latest-and-greatest version. I know that's true for Perl; I'm sure it's equally true for Tcl. (If you *can* count on having a reasonably modern version on *all* the systems where your code needs to run, that's great.) – Keith Thompson Jul 12 '12 at 19:06
1

I'll propose a solution, but I'm not confident it's correct.

#
#   notes
#
#   -  "[]" has to appear in the beginning of a character class
#   -  "-" has to come last in a character class
#   -  "#" is not special, but anticipating the x modifier...
#   -  "-" is not special, but anticipating interpolation within "[]"...
#   -  "/" is not special in Tcl
#
proc quotemeta {str} {
    regsub -all -- {[][#$^*()+{}\|.?-]} $str {\\\0} str
    return $str
}
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • 1
    You have to swap the `[]` to `][` to make it compile (instead of throwing an exception) and the backslash needs to be doubled to make it “really” there. `{[][#$^*()+{}\\|.?-]}` works when I run it against a punctuation sampler. – Donal Fellows Jul 11 '12 at 23:43
  • Thanks, @DonalFellows! Edited. But for me I don't seem to need the double backslash. tclsh (8.0.5, I'm that guy): ``% quotemeta {$6.00 - Magic Hat #9 \ [Draught Special]}`` outputs ``\$6\.00 \- Magic Hat \#9 \\ \[Draught Special\]`` – Andrew Cheong Jul 12 '12 at 14:30