1

I am trying to capture groups in a text that only match when the match is not followed by a specific character, in this case the opening parentheses "(" to indicate the start of a 'function/method' rather than a 'property'.

This seems pretty straightforward so I tried:

TEXT

$this->willMatch but $this->willNot()

RESULT

RegExp pattern: \$this->[a-zA-Z0-9\_]+(?<!\()
Expected: $this->willMatch
Actual: $this->willMatch, $this->willNot

RegExp pattern: \$this->[a-zA-Z0-9\_]+[^\(]
Expected: $this->willMatch
Actual: $this->willMatch, $this->willNot

RegExp pattern: \$this->[a-zA-Z0-9]+(?!\()
Expected: $this->willMatch
Actual: $this->willMatch, $this->willNo

My intuition says i need to add ^ and $ but that wont work for multiple occurrences in a text.

Curious to meet the RegExp wizard that can solve this!

Makyen
  • 31,849
  • 12
  • 86
  • 121
Gijs
  • 165
  • 1
  • 9
  • This question is [currently under discussion on Meta](https://meta.stackoverflow.com/q/405720/3773011). – Makyen Mar 03 '21 at 00:05

2 Answers2

8

Answer from The fourth bird definitely works and it is well explained as well.

As an alternative to using word boundary one can use possessive quantifier i.e. ++ to turn off backtracking thus improving efficiency further.

\$this->\w++(?!\()

RegEx Demo

Please note use of \w instead of equivalent [a-zA-Z0-9_] here.

Like a greedy quantifier, a possessive quantifier repeats the token as many times as possible. Unlike a greedy quantifier, it does not give up matches as the engine backtracks.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • It looks like only PCRE has this feature. It does not work on Python. – Alex Reinking Mar 05 '21 at 22:21
  • Yes that's right. You will need to use `\b` option for python – anubhava Mar 05 '21 at 22:30
  • 1
    @AlexReinking The stock `re` regular expression engine for Python does not support possessive quantifiers. However, Python has other regular expression implementations with expanded syntax. For example, on SmokeDetector we use the [`regex` package](https://pypi.org/project/regex/), which does support possessive quantifiers. – Makyen Mar 05 '21 at 22:35
6

The (?<!\() will always be true as the character class does not match a (

Note that you don't have to escape the \_

You can use a word boundary after the character class to prevent backtracking, and turn the negative lookbehind into a negative lookahead (?!\() to assert not ( directly to the right.

\$this->[a-zA-Z0-9_]+\b(?!\()

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70