4

I'm building some regex searches and ran into one I can't seem to solve. I'm searching for all incorrect capitalization and punctuations of Split "S" (the aircraft maneuver). The expression I'm using is:

[Ss]plit[ -]“?[Ss]”?(?<!Split “S")

The goal is to find all combinations of initial caps and punctuation (space, hyphen, smart quoteddbl) and using a lookahead negation (I think that's what it's called) to exclude the correct term Split “S”. It works great at finding all the variations, but also finds Split “S”...ignoring the right double quote punctuation. I'm having no luck, so thought I'd turn to the experts.

This type of negation works great for other terms, such as “V” Diagram or System M. It appears the smart double quotes are the problem. Using the expression above, I was expecting it to find: split “S” Split-S Split S Split S, but not Split “S”. Instead it finds all the terms, including Split “S (excepting the left double quote).

I'm using FrameMaker ExtendScript with Perl regex.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Gator
  • 63
  • 4
  • You can also [skip](https://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex) the correct part: [`Split “S”(*SKIP)(*F)|[Ss]plit[ -]“?[Ss]”?`](https://regex101.com/r/4QF201/1) – bobble bubble Dec 18 '22 at 22:57
  • 1
    Thanks for taking the time to answer. Unfortunately, the (*SKIP)(*F) does not work in the regex implementation in Adobe Framemaker. It supports a couple of implementations, but I've done all my search strings in Perl. I'm pretty much a novice at this. – Gator Dec 21 '22 at 18:42

1 Answers1

1

You need to change the last optional pattern quantifier to a possessive one:

[Ss]plit[ -]“?[Ss]”?+(?<!Split “S”)
                    ^

See the regex demo. The point here is to make sure no backtracking occurs once the lookbehind returns false. Since the ”? is optional and allows backtracking, the engine returns an "incomplete match" if the lookbehind returns false.

Another way is to convert the lookbehind to a lookahead and place it at the beginning:

(?!Split “S”)[Ss]plit[ -]“?[Ss]”?

See this regex demo. Here, you ask the regex engine to fail right at the place where Split “S” occurs immediately to the right of the current location.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Victor...your answer was just what I needed. I have have one more for you if you have the time...I'm running into a similar problem when the smart quotes are around the letter at the beginning of the search string; e.g., with "S" turn being the correct form and using “?[Ss]”?[ -][Tt]urn(?<!“S” turn) works fine until I run into aircraft's turn radius. It finds the 's turn in the phrase. I've tried using a leading word boundary, but that doesn't work either. – Gator Dec 21 '22 at 18:24
  • @gator I do not understand your new issue, [this demo](https://regex101.com/r/HTfjAr/1) shows the regex does what it should. It can only fail a match if it is `“S” turn` exactly. – Wiktor Stribiżew Dec 22 '22 at 12:00
  • Thanks for the reply...you confirmed I was on the right path. My problem was that it also picked up [s turn] when it's found in "aircraft's turn radius." My solution turned out to be a negative lookbehind for the apostrophe preceding the "s". Full solution was: (?<!’)“?[Ss]”?[ -][Tt]urn(?<!“S” turn). Thanks so much for your help. – Gator Dec 22 '22 at 21:21
  • @gator Or, `“?\b(?<!\b[“'‘’])[Ss]”?[ -][Tt]urn(?<!“S” turn)` – Wiktor Stribiżew Dec 22 '22 at 22:00
  • Thanks Wiktor...it works great. It'll take me a bit with the Regex101 tool to fully understand how it works. I'm still very new to assertions. From the examples you've shared, I can see I need to master them. They are incredibly powerful. Using assertions I was able to take dozens of two or three expression searches and boil them down into a single expression (if my terminology is right). Happy New Year! – Gator Dec 26 '22 at 22:42
  • @gator Happy New coming Year :) – Wiktor Stribiżew Dec 26 '22 at 23:01