1

Let's say I have a string:
cat_dog_ !mouse= <name="Jake_Russell!"> gog+cat

I want to match specific symbols _!=+, but not in <name="Jake_Russell!"> that part of this regex <name=\".+\">. So result should be __!=+

I've tried lookAhead:
(?!<name=\".+\">)([_!=+])
but as a result, it matches symbols in <name="Jack_Russell!"> too.

Iga
  • 85
  • 5
  • Is another way of expressing the requirement *match a list of special characters, except when they are between quotes*? – Bohemian Apr 02 '23 at 03:09
  • @Bohemian the requirement is to match a list of special characters. But everywhere except this specific pattern expression: . Not just between quotes. Symbol `=` in ` – Iga Apr 02 '23 at 03:33
  • 1
    Unrelated: The dog breed is [Jack Russel](https://en.wikipedia.org/wiki/Jack_Russell_Terrier), not Jake Russel – Bohemian Apr 02 '23 at 03:46
  • I don't think you can do it because you'd need a variable length look behind, which are not supported. Would not matching within quotes and not matching after ` – Bohemian Apr 02 '23 at 04:04
  • 1
    @Bohemian, related to unrelated: as shown at your link, the dog breed is Jack Russell Terrier. Note two "l"'s in "Russell" ¯\\_(ツ)_/¯ (great dogs). – Cary Swoveland Apr 02 '23 at 21:57

3 Answers3

3

I think you could try capturing groups, capture part <name=\".+\"> into 1 ignored group, and another group for matched specific symbols.

Regex patten: (?<ignored_group><name=".+">)|(?<matched_group>[_!=+])

See demo here

Trung Duong
  • 3,475
  • 2
  • 8
  • 9
2

You can rule out what you don't want, and then capture when you want using an alternation and a capture group:

<name="[^"]*">|([_!=+])

Explanation

  • <name= Match literally
  • "[^"]*" Negated character class, match from "..."
  • > Match literally
  • | Or
  • ([_!=+]) Capture group 1, match any of the listed characters

Regex demo

If there can be more than name= and no more occurrences of < and > you might also use:

<[^<>]*\bname="[^"]*"[^<>]*>|([_!=+])

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

Because variable length look behinds are not supported, you can't exclude matches that appear after particular text.

However, you can exclude a match immediately after <name and exclude matches within quotes, which is the best you can do given the limitations of regex:

(?<!<name)[_!=+](?=(([^"]*"){2})*[^"]*$)

See live demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722