3

The statement

Regex.Replace("XB", "([^A])B", "$1AB")

produces XAB, as expected. Can someone explain me why

Regex.Replace("XBB", "([^A])B", "$1AB")

does not produce XABAB, but XABB? It's like the regex parser no longer has knownledge of the preceding character when it reaches the second B.

Ultimately, I want to replace all Bs not preceded by a A by AB.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
François Beaune
  • 4,270
  • 7
  • 41
  • 65
  • 1
    The answer you chose won't really work. For future reference, I'd wait a while for alternative answers. –  Jan 12 '15 at 19:33
  • Does this answer your question? [Regex for matching something if it is not preceded by something else](https://stackoverflow.com/questions/9306202/regex-for-matching-something-if-it-is-not-preceded-by-something-else) – bobble bubble Jul 06 '22 at 14:07

3 Answers3

3

All B's not preceded by a A by AB.

Find: (?<!A)B
Replace: AB

  • That indeed looks good. It's simpler and more logical than the answer I accepted. Could you provide an example where @Avinash's solution fails while yours works? – François Beaune Jan 12 '15 at 23:09
  • 1
    @FrançoisBeaune: There is one difference between this and the other answer: This answer will replace `B` at the beginning of the string, while the other answer does not. You are not very clear what will happen in such case, though. – nhahtdh Jan 13 '15 at 02:48
  • 1
    Indeed, I had to use `([^A]|^)(?=B)` to make it also work at the beginning of the string. Your solution is simpler and cleaner. – François Beaune Jan 13 '15 at 09:47
  • The pattern in this answer is handy for converting text with Unix newlines to text with DOS/Windows newlines, when adjusted as follows: `(?<!\r)\n` – MCattle May 25 '16 at 18:33
  • 1
    Be aware that even nowadays 2021 safari still not support lookbehind feature. – Herbert Pimentel Mar 09 '21 at 23:03
  • 1
    This works in JS([link]https://regexr.com/652sd[/link]) but not powershell. I am using `-replace` to remove the character. `-replace '(?<!")(`n)', ''` – hirani89 Sep 06 '21 at 07:54
3

In a .NET regex flavor, you may use lookbehinds like this:

Match foo not immediately preceded with bar

(?<!bar)foo

See the regex demo.

Match foo not preceded with bar anywhere in the string

(?s)(?<!bar.*?)foo

See the regex demo.

Match foo immediately preceded with bar

(?<=bar)foo

See the regex demo.

Match foo preceded with bar anywhere in the string

(?s)(?<=bar.*?)foo

See the regex demo.

The latter contains .*? in the negative lookbehind allowing the regex engine to check for bar and any zero or more chars immediately to the left of foo, so the bar does not have to come immediately before foo.

The (?s) inline modifier allows . to match any characters including newlines.

The current problem is easily solved by using a negative lookbehind (see the top scenario),

var result = Regex.Replace("XBB", "(?<!A)B", "AB");
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I know the original answer is about .NET, but is there any way to achieve the `(?<!bar.*?)foo` behavior with PCRE without the **"lookbehind assertion is not fixed length"** error? – Ivan Shatsky Jan 27 '21 at 23:05
  • 1
    @IvanShatsky Yes, use `bar.*foo(*SKIP)(*F)|foo` or even `bar.*?foo.*(*SKIP)(*F)|foo` – Wiktor Stribiżew Jan 27 '21 at 23:07
  • Wow, a big thanks, didn't know about `(*SKIP)` or `(*F)` before. In case anyone looking for the same and meet the `(*SKIP)` and `(*F)` first time like me, you can find an explanation [here](https://stackoverflow.com/a/24535912/7121513). – Ivan Shatsky Jan 27 '21 at 23:25
  • This works in JS(regexr.com/652sd) but not powershell. I am using `-replace` to replace the character. `-replace '(?<!")(n)', ''`. I am trying to remove new lines that occur after any character except `"`. What is wrong in my command. – hirani89 Sep 06 '21 at 09:00
  • @hirani89 `(Get-Content $filepath -Raw) -replace '(?m)(?<=[^\r\n"])(?:\r\n?|\n)'` – Wiktor Stribiżew Sep 06 '21 at 09:17
1

Note that this ([^A])B regex matches the first XB and captures the X . Because the B following the X is already present in a match, so it won't be matched another time. In this case, i suggest you to use lookarounds.

([^A])(?=B)

(?=B) Positive lookahead which asserts that the match must be followed by the letter B.

But it produces XABBABB when the replacement string is $1AB. To get the desired output, just remove the B from the replacement string. That is replace the matched characters with \1A

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274