9

I want to convert a glob-style pattern into a Raku regex. This is how I'm doing it now:

s :global {
    || $<question-mark> = '?'
    || $<asterisk>      = '*'
    || $<non-word>      = \W
} = $<question-mark> ?? '.' !! $<asterisk> ?? '.*' !! "\\$<non-word>";

Is it correct to prefix every non-word character with a backslash in this way? That is, will this miss escaping anything that should be, or escape anything that shouldn't be?

I'm a little baffled that Raku did away with Perl 5's quotemeta function, which would be ideal here. It wouldn't be needed nearly as often, as noted in the answers to this question, but in a situation like this I'm left to hand-roll a solution that I'm not sure is adequate.

Sean
  • 29,130
  • 4
  • 80
  • 105

2 Answers2

11

Raku regexes can contain quoted string literals:

say "food" ~~ /. "oo" /; # 「foo」

One can take a Str and turn it into a Raku source representation by calling .raku:

say "oh\n\"".raku; # "oh\n\""

This handles escaping of the string construct as required, meaning it is then safe to emit into the regex.

As an aside, while it's still experimental, the upcoming RakuAST will allow for constructing regexes by building up an AST, which will provide another safe and more general solution.

Jonathan Worthington
  • 29,104
  • 2
  • 97
  • 136
2

With my version of Rakudo (v2022.07), the following escape-wrapping works:

  1. take the literal and wrap in q[],
  2. take the q[…] above and wrap in <{}>.

Tested as a one-liner at either the zsh or bash command line:

~$ zsh
~% raku -e 'say "food" ~~ / . <{ q[oo] }> /;'
「foo」

~% bash
~$ raku -e 'say "food" ~~ / . <{ q[oo] }> /;'
「foo」

Variations of Raku's "Q-language" can be tried: I've had success with square brackets as above. See: https://docs.raku.org/language/quoting.html . Note, make sure you add the < > angle brackets, otherwise the literal wrapped in { } curlies will appear invisible (it gets executed as a codeblock):

~$ zsh
~% raku -e 'say "food" ~~ / { q[food] } /;'
「」
~% raku -e 'say "nothing" ~~ / { q[nothing] } /;'
「」

~% bash
~$ raku -e 'say "food" ~~ / { q[food] } /;'
「」
~$ raku -e 'say "nothing" ~~ / { q[nothing] } /;'
「」

Above might be most useful for cross-platform Regexes, rather that swapping Linux/Unix "external-single-and-internal-double-quotes" for Windows "external-double-and-internal-single-quotes", and vice-versa. You can even try using qb[…] to get backslash-escape recognition (e.g. useful for problematic \n newline recognition):

~$ zsh
~% raku -e 'say "food\ntruck" ~~ / . <{qb[ ood \\n tru ]}> .. /;'
「food
truck」

~% bash
~$ raku -e 'say "food\ntruck" ~~ / . <{qb[ ood \\n tru ]}> .. /;'
「food
truck」

Credit to @fecundf for starting many of us on the topic of understanding/codifying interpolation within a regex matcher (feel free to peruse the thread below).

https://www.nntp.perl.org/group/perl.perl6.users/2019/09/msg6960.html

jubilatious1
  • 1,999
  • 10
  • 18
  • I want to escape arbitrary text that is not known in advance, so it would seem that text containing a closing bracket could not be wrapped in `q[...]` since the bracket would prematurely close the quoting construct. If I'm misunderstanding, please provide a variation on my substitution that handles arbitrary text. – Sean Jun 14 '23 at 20:22
  • @Sean, In a broad sense single- and double-quoting are all part of the same Q-language in Raku. In a previous comment I directed you to a similar question which had an answer [here](https://stackoverflow.com/a/72453516/7270649). That answer didn't seem acceptable, so I posted this one. In point-of-fact, if you're essentially `TR/?/./` translating the `?` characters in your code, you _should be able to_ `q?…?` everything afterwards. I could work on that and maybe incorporate it above. Regardless, I hope you find my answer (and this comment) useful. Best Regards. – jubilatious1 Jun 15 '23 at 19:54
  • I appreciate the effort you've put in, but I think you've misunderstood the point of my question. You started off with "take the literal," but I don't want to escape a literal; I want to take an arbitrary string I read from a file, then turn `?` into `.`, `*` into `.*`, and escape everything else so I can use the transformed string as a regex. Jonathan offered a solution that eg. turns the string `foo^*&bar`, not provided as a literal, into the string `"foo^".*"&bar"`, which does the trick. I can't see how to leverage your response towards the same goal. – Sean Jun 15 '23 at 20:58
  • @Sean again, it seems to me that you accepted an answer here that is [exactly](https://stackoverflow.com/a/72453516/7270649) the same as an answer I posted over a year ago. Regards. – jubilatious1 Jun 15 '23 at 21:45