11

Some regex engines support backtracking-related verbs: (*PRUNE), (*SKIP), (?{doSomeCode();})*, etc. I already know what these verbs do, from Reference - What does this regex mean?.

I'm inclined to think that these verbs are somewhat esoteric, or at least an unnecessary step to a more low-level type of programming. Instead of needing a (*PRUNE), wouldn't it be better (reducing the complexity for both regex writer/programmer and reader) to have more optimizations behind the scenes (like in the compiler or engine)?

So, in what situation is it useful to include a backtracking-related verb in a regex, in practice? Is there ever a benefit to do so?


* While not technically a backtracking control verb, many examples that execute arbitrary code in regexes do it in a way that affects or is affected by backtracking.


Background

These features were originally experimental, though they are no longer labeled as such in the Perl regex tutorial. It's no wonder that I am unable to find much about these constructs on the Internet (especially when the search is clogged with irrelevant results that have skip or prune outside the code). I'll bet that there are numerous people advanced enough in regex to use these verbs that simply don't know about them.

So there are a number of practical barriers preventing widespread use:

  1. Features are experimental
  2. Features are obscure
  3. Features are advanced

Trying to find an answer that looks past this and finds a good use-case, or the reasoning from the developers who created these features.

I'm also aware that a similar closed (too opinion-based) question exists, but it didn’t answer my question as the only answer to say "yes" to that question gave two links, one of which was an esoteric use (additionally, I cannot understand it...). The other one, while it gave a situation of when to use (*FAIL), did not did not address any of the other constructs I mentioned, nor did it use (*FAIL) as a Backtracking mechanism. From what I understand, (*FAIL) can be emulated by any regex that always fails.

Let me respecify what I am looking for in an answer:

  • Relates specifically to backtracking
  • Non-Esoteric
  • Practical
  • More than an example of usage
  • Has an explanation for any examples given
  • May include background on reason for adding features
  • May include updates, with sources, relevant to the future of the features (in Perl or other regex flavors)
Laurel
  • 5,965
  • 14
  • 31
  • 57
  • 1
    Possible duplicate of [Have you used the Perl 5.10 backtracking control verbs in your regexes yet?](http://stackoverflow.com/questions/253760/have-you-used-the-perl-5-10-backtracking-control-verbs-in-your-regexes-yet) – miken32 Mar 23 '16 at 20:59
  • I fail to see how this question ("So, in what situation is it useful to include a backtracking-related verb in a regex, in practice?") is different from the linked, and rightfully closed one ("Have you used the Perl 5.10 backtracking control verbs in your regexes yet? And what problems did they help you [solve]?") – miken32 Mar 23 '16 at 21:01
  • @miken32 I updated my question with a more detailed explanation of why it doesn't answer the question. Is it clear now how what I'm asking is different? I'm not asking _just_ for examples. – Laurel Mar 23 '16 at 21:57
  • Rexegg has a [tutorial worth reading](http://www.rexegg.com/backtracking-control-verbs.html). The most used practical verbs are probably [`(*SKIP)(*FAIL)` or `(*SKIP)(*F)` together with this trick](http://www.rexegg.com/regex-best-trick.html#pcrevariation). As an example, let's say you want to *match `is` if not inside parenthesis*. What's inside can be [skipped by use of these verbs (demo)](https://regex101.com/r/dL1dJ9/1). – bobble bubble Mar 24 '16 at 12:18

2 Answers2

2

One good piece of documentation you can look at is the section about directives in Parse::RecDescent. The <commit> directive, in particular, seems somehow related to (*PRUNE) (although there is a (*COMMIT) too), and contains an instructive example.

My personal impression is that most of the times they provide you tools to make your regexes better (e.g. more performing, or clearer) but not necessarily more effective. As an example, you can probably live without (*PRUNE), but you would suffer from heavier backtracking and how this affects you depends on what you're trying to match. Re (*FAIL), it might probably be emulated with a non-matching sub-regex, but it's much more clear what the intent is that it at least enhances readability.

polettix
  • 449
  • 1
  • 3
  • 9
  • Does the linked material use Perl 6 regexes? I'm not very familiar with Perl 6 regexes, although it seems like it's more similar to BNF. (I've learned a lot since I originally asked this q, but I still have no clue what `bless`, for example, does in Perl.) – Laurel Apr 16 '16 at 02:39
  • No, the link refers to a Perl 5 module that works back up to 5.6.2 http://matrix.cpantesters.org/?dist=Parse-RecDescent+1.967013 - the link was intended just to give you one example of where pruning of the search space might make sense anyway. – polettix Apr 16 '16 at 06:15
0

In short: You'll know if you need to use them, and if you're not sure, don't.

As others have alluded to, these tend to be performance based optimization tools suitable for grammatical processing. Not only is premature optimization the root of all evil, these features were marked as experimental until only recently. Therefore one might reasonably deduce that they are not necessary for most use cases, and that it's best not go borrowing trouble/complexity unless necessary.

belg4mit
  • 61
  • 4