regex lookbehind is `>` a shortcut for `<=`?

Question

Through some convoluted testing, I ended up potentially discovering a shortcut.

A lookbehind in PowerShell is supposed to use the <= syntax, which is referenced in various other places when googling for lookbehinds in PowerShell, e.g., this Microsoft blog.

Take this simple example Regex:

(?>^[^x]*)$

(?> begins the lookbehind
^[^x]* tests that the character x is not present since the beginning of the string
) closes the lookbehind
$ anchors the end of the line

When I test it:

'sample = x' -match '(?>^[^x]*)$'
False

'sample = ' -match '(?>^[^x]*)$'
True

The first block returns false: the lookbehind does not match a string without x.

The second block returns true: the lookbehind matches a string without x.

It seems to work!

Now if I try to use the <= syntax:

'sample = x' -match '(?<=^[^x]*)$'
False

'sample = ' -match '(?<=^[^x]*)$'
True

It has the same behavior.

Is this a "shortcut" for RegEx lookbehinds in PowerShell, or why is > working at all?

435|PS(7.2.1) C:\Users\User\Documents [230211-15:03:27]> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      7.2.1
PSEdition                      Core
GitCommitId                    7.2.1
OS                             Microsoft Windows 10.0.22621
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

This `(?>` is not a lookbehind, it denotes an [atomic group](https://learn.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#atomic_groups). — The fourth bird, Feb 11 '23 at 14:13
Does this answer your question? [Confusion with Atomic Grouping - how it differs from the Grouping in regular expression of Ruby?](https://stackoverflow.com/questions/14411818/confusion-with-atomic-grouping-how-it-differs-from-the-grouping-in-regular-exp) — Wiktor Stribiżew, Feb 11 '23 at 14:40
The proposed duplicate, while certainly related, isn't a duplicate per se, because it asks about atomic groups vs. regular capture groups. This question is about atomic groups vs. lookbehind assertions; also, the regex flavor is different (Ruby vs. .NET). — mklement0, Feb 11 '23 at 19:29

mklement0 · Accepted Answer · 2023-02-11T18:36:44.583

tl;dr

Regex grouping constructs (?<=…) and (?>…) serve different purposes and only happen to work the same in your particular scenario; neither is called for in your scenario.
Use '...' -notmatch 'x' to test if a given string contains any instances of 'x' (returns $true if not).

Background information:

The two grouping constructs you reference serve different purposes (enclosed subexpressions are represented with placeholder … below):

(?<=…) is a (zero-width, positive) lookbehind assertion:
- It is a non-capturing grouping construct that must match the enclosed subexpression immediately before (to the left, i.e. "looking behind") where the remaining expression matches, without capturing what the subexpression matched.
- In essence, this means: When what follows this construct matches, also make sure (assert) that what comes before it matches the subexpression inside (?<=…); if it doesn't, there's no overall match; if it does, don't capture (include in the results) its match.
- Therefore, this construct only makes sense if placed before a capturing construct; e.g.:
```
# Matches only 'unheard of', because only in it is the match
# for 'hear.' preceded by 'un'
# Captures only 'heard' from the matching string, not the 'un'
'heard from', 'unheard of' -match '(?<=un)hear.'
```
(?>…) is an atomic group, aka non-backtracking subexpression:
- It is a capturing grouping construct - similar to a regular capture group (matched subexpression), (…) - except that it never backtracks.
- In essence, this means: once the subexpression has found a match, it won't allow backtracking based on the remainder of the expression; this construct is mostly used as a performance optimization when it is known that backtracking wouldn't succeed.
```
# Atomic group:
# -> $false, because the atomic group fully consumes the string,
#    so there's nothing for '.' to match *after* the group.
'abc!' -match '(?>.+).'

# Regular capture group:
# -> $true, with backtracking; the capture group captures 'abc'
'abc!' -match '(.+).'
```

What you tried:

(?<=^[^x]*)$ - your regex with a lookbehind assertion

As noted above, there's no good reason to use a lookbehind assertion without following it with a capturing expression. Your regex will by definition not capture anything ($ is itself an assertion).

Since you're matching the whole string, the immediate simplification would be not to use a grouping construct at all (but see the bottom section):

^[^x]*$

As an optimization, if you explicitly want to prevent the capturing that happens by default, use a noncapturing group, (?:…):

(?:^[^x]*$)

(?>^[^x]*)$ - your regex with an atomic group

Since you're matching the whole string, there is no reason to use a atomic group, given that there's no backtracking that needs preventing, so this regex is in effect the same as (^[^x]*)$, i.e. a regular capture group (followed by $).

As noted, there's no reason to capture anything here, so (?:^[^x]*$) would prevent that.

In short:

Both your regexes match the input string in full, and therefore require no grouping construct (except, optionally, to explicitly prevent capturing).
Read on for a much simpler solution.

Taking a step back:

The conceptually simplest and most efficient solution is:

'...' -notmatch 'x'

That is, you can let -notmatch, the negated form of PowerShell's -match operator look for (at most one) x, and negate the Boolean result, so that not finding any x returns $true.

In other words: the test succeeds if no x is present in the input string.

regex lookbehind is `>` a shortcut for `<=`?

1 Answers1