1

I am successfully chaining two negative regex (C#/.NET) lookaheads to exclude <html> tags that contain ' amp' or ' ⚡':

regex101

<\s*?html((?!.*?\samp[\s>])(?!.*?\s⚡[\s>]).*?)>

But I was searching for a possibility to OR combine the two searches, something like [amp|⚡]:

<\s*?html((?!.*?\s[⚡|amp][\s>]).*?)>

Is that possible, and if so, what would be the correct syntax?

EDIT: I had an error in the initial question, because it didn't show quite what I wanted to capture: everything after html up to >, if it doesn't include amp or ⚡. Now the statements are right and also the regEx101. And a good solution has also been commented, that is to use () instead of []. So my current working solution is:

<\s*?html(
  (?!
    .*?
    \s(⚡|amp)[\s>]
  )
.*?)>
jamacoe
  • 519
  • 4
  • 16

1 Answers1

0

The correct syntax is using a group instead of a character class, but the pattern also has a few unnecessary parts.

  • You can remove .*? at the end as it will not match anything because it is optional, and adds nothing to the assertion.
  • You can remove the parenthesis here (.*)? including the question mark as the .* is already optional
  • You can remove the whole outer capturing group because the pattern inside it does not match anything.

If there are no further angle brackets after the opening <html you can write it as:

<html(?![^<>]* (?:amp|⚡)[ >])

Explanation

  • <html Match literally
  • (?! Negative lookahead
    • [^<>]* Optionally match any char except < and >
    • (?:amp|⚡)[ >] Match a space, either amp or ⚡ followed by either a space or >
  • ) Close the lookahead

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70