0

Through some convoluted testing, I ended up potentially discovering a shortcut.

A lookbehind in PowerShell is supposed to use the <= syntax, which is referenced in various other places when googling for lookbehinds in PowerShell, e.g., this Microsoft blog.

Take this simple example Regex:

(?>^[^x]*)$

  • (?> begins the lookbehind
  • ^[^x]* tests that the character x is not present since the beginning of the string
  • ) closes the lookbehind
  • $ anchors the end of the line

When I test it:

'sample = x' -match '(?>^[^x]*)$'
False
'sample = ' -match '(?>^[^x]*)$'
True

The first block returns false: the lookbehind does not match a string without x.

The second block returns true: the lookbehind matches a string without x.

It seems to work!

Now if I try to use the <= syntax:

'sample = x' -match '(?<=^[^x]*)$'
False
'sample = ' -match '(?<=^[^x]*)$'
True

It has the same behavior.

Is this a "shortcut" for RegEx lookbehinds in PowerShell, or why is > working at all?

435|PS(7.2.1) C:\Users\User\Documents [230211-15:03:27]> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      7.2.1
PSEdition                      Core
GitCommitId                    7.2.1
OS                             Microsoft Windows 10.0.22621
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0
mklement0
  • 382,024
  • 64
  • 607
  • 775
Blaisem
  • 557
  • 8
  • 17
  • 2
    This `(?>` is not a lookbehind, it denotes an [atomic group](https://learn.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#atomic_groups). – The fourth bird Feb 11 '23 at 14:13
  • 2
    Does this answer your question? [Confusion with Atomic Grouping - how it differs from the Grouping in regular expression of Ruby?](https://stackoverflow.com/questions/14411818/confusion-with-atomic-grouping-how-it-differs-from-the-grouping-in-regular-exp) – Wiktor Stribiżew Feb 11 '23 at 14:40
  • The proposed duplicate, while certainly related, isn't a duplicate per se, because it asks about atomic groups vs. regular capture groups. This question is about atomic groups vs. lookbehind assertions; also, the regex flavor is different (Ruby vs. .NET). – mklement0 Feb 11 '23 at 19:29

1 Answers1

2

tl;dr

  • Regex grouping constructs (?<=…) and (?>…) serve different purposes and only happen to work the same in your particular scenario; neither is called for in your scenario.

  • Use '...' -notmatch 'x' to test if a given string contains any instances of 'x' (returns $true if not).


Background information:

The two grouping constructs you reference serve different purposes (enclosed subexpressions are represented with placeholder below):

  • (?<=…) is a (zero-width, positive) lookbehind assertion:

    • It is a non-capturing grouping construct that must match the enclosed subexpression immediately before (to the left, i.e. "looking behind") where the remaining expression matches, without capturing what the subexpression matched.

    • In essence, this means: When what follows this construct matches, also make sure (assert) that what comes before it matches the subexpression inside (?<=…); if it doesn't, there's no overall match; if it does, don't capture (include in the results) its match.

    • Therefore, this construct only makes sense if placed before a capturing construct; e.g.:

      # Matches only 'unheard of', because only in it is the match
      # for 'hear.' preceded by 'un'
      # Captures only 'heard' from the matching string, not the 'un'
      'heard from', 'unheard of' -match '(?<=un)hear.'
      
  • (?>…) is an atomic group, aka non-backtracking subexpression:

    • It is a capturing grouping construct - similar to a regular capture group (matched subexpression), (…) - except that it never backtracks.

    • In essence, this means: once the subexpression has found a match, it won't allow backtracking based on the remainder of the expression; this construct is mostly used as a performance optimization when it is known that backtracking wouldn't succeed.

      # Atomic group:
      # -> $false, because the atomic group fully consumes the string,
      #    so there's nothing for '.' to match *after* the group.
      'abc!' -match '(?>.+).'
      
      # Regular capture group:
      # -> $true, with backtracking; the capture group captures 'abc'
      'abc!' -match '(.+).'
      

What you tried:

(?<=^[^x]*)$ - your regex with a lookbehind assertion

As noted above, there's no good reason to use a lookbehind assertion without following it with a capturing expression. Your regex will by definition not capture anything ($ is itself an assertion).

Since you're matching the whole string, the immediate simplification would be not to use a grouping construct at all (but see the bottom section):

^[^x]*$

As an optimization, if you explicitly want to prevent the capturing that happens by default, use a noncapturing group, (?:…):

(?:^[^x]*$)


(?>^[^x]*)$ - your regex with an atomic group

Since you're matching the whole string, there is no reason to use a atomic group, given that there's no backtracking that needs preventing, so this regex is in effect the same as (^[^x]*)$, i.e. a regular capture group (followed by $).

As noted, there's no reason to capture anything here, so (?:^[^x]*$) would prevent that.

In short:

  • Both your regexes match the input string in full, and therefore require no grouping construct (except, optionally, to explicitly prevent capturing).

  • Read on for a much simpler solution.


Taking a step back:

The conceptually simplest and most efficient solution is:

'...' -notmatch 'x'

That is, you can let -notmatch, the negated form of PowerShell's -match operator look for (at most one) x, and negate the Boolean result, so that not finding any x returns $true.

In other words: the test succeeds if no x is present in the input string.

mklement0
  • 382,024
  • 64
  • 607
  • 775