0

I want to match a pattern using regular expressions, but I need some exceptions to the match. For instance, match every occurence of "John Doe" except for those occurences where "John Doe" is enclosed by bold tags, i.e. "<b>John Doe</b>".

Match: John Doe
Don't match: <b>John Doe</b>

How can I achieve this with regular expressions?

Clarification: I want to exclude everything between the bold tags. This excluded content may contain a wide variety of characters, line breaks and so on.

  • 2
    What's your regex flavor ? It's important as things like negative lookbehind aren't available in all flavors. – Denys Séguret Mar 15 '13 at 14:45
  • 1
    `John Doe` is enclosed by bold tag. Do you want to exclude it? – nhahtdh Mar 15 '13 at 14:56
  • It's for a PHP `preg_replace` function. I want to exclude everything between two bold tags, in my example. The content within the bold tags will be of varying sort, and will contain code from different languages. I'll try some of the suggestions provided here. Thanks! –  Mar 15 '13 at 15:02
  • 1
    @nhahtdh if we start to try to exclude `some trapJohn Doesome trap`, we might end with [pon̷ies](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)... – Denys Séguret Mar 15 '13 at 15:03
  • Based on your clarification: Do you want to match **anything** that is not between bold tags? i.e. anything from `` and up to (the first occurrence of) `` should **not** match? – rvalvik Mar 15 '13 at 15:32
  • And which part of the string are you replacing? Things in bold or things not in bold? If you update your question with some input and output examples things would be much clearer. – rvalvik Mar 15 '13 at 16:10
  • I found a different solution that skipped the regex mess. But I'll mark your answer as correct for other people to reference, since it's a bit more comprehensive than Howard's. Here's where I found the alternative solution. http://www.php.net/manual/en/function.nl2br.php#97972 –  Mar 15 '13 at 17:30

3 Answers3

1

If your regex dialect allows lookarounds you may use a negative lookbehind and a negative lookahead to achieve that task:

(?<!<b>)John Doe(?!<b>)
Howard
  • 38,639
  • 9
  • 64
  • 83
  • Thanks! I'll vote this up but mark another answer as correct just because it's a bit more elaborated. –  Mar 15 '13 at 17:31
0

Using Perl you can use negative lookbehind:

$ echo "<b>John Doe</b>" | perl -ne 'print if /(?<!<b>)John Doe/'

(above prints nothing - does not match).

$ echo "John Doe" | perl -ne 'print if /(?<!<b>)John Doe/'
John Doe

(above matches).

Symbol (?<!<b>) is a negative lookbehind - string matches if it's not followed by what's inside of it (<b> in this case).

kamituel
  • 34,606
  • 6
  • 81
  • 98
0

You could use negative look-arounds for this:

(?<!<b>)John Doe(?!</b>)

That wouldn't match <b>John Doe or John Doe</b> either though.

If you only want to not match instances with both the opening and closing tag you could do something like:

John Doe(?!(?<=<b>John Doe)</b>)

Or slightly shorter (but less understandable - 8 is the length of John Doe):

 John Doe(?!(?<=<b>.{8})</b>)
rvalvik
  • 1,559
  • 11
  • 15
  • Thanks! That's almost what I want. Although, the `John Doe` must be allowed to have newlines and other characters preceeding and following it. I only know that somewhere between the bold tags, there are occurences of `John Doe` which must not be matched by the pattern. –  Mar 15 '13 at 15:09
  • Are there instances where you have bold tags inside the enclosing bold tags? Either one of them or both. Or do you know that you'll never find `` or `` inside the enclosing bold tags? Also: are the "enclosing" bold tags the **only** bold tags in the entire string? – rvalvik Mar 15 '13 at 15:22
  • I know that I'll never use bold tags within bold tags. And no, there will be several pairs of bold tags in the entire string. –  Mar 15 '13 at 15:52