1

I want to select every < characters but <pre or </pre in this document with Regexp

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title></title>
</head>
<body>
    <pre>
        if a < b
    </pre>
</body>
</html>

enter image description here

uzay95
  • 16,052
  • 31
  • 116
  • 182
  • In general, you don't want to process HTML or XML with regular expressions. –  Apr 26 '13 at 17:59
  • Just for kicks: http://stackoverflow.com/a/1732454/1858225 Really, though, it doesn't look like OP is trying to *parse* the regex; this looks more like a simple attempt to highlight all the tags except for `pre` tags, which is definitely simple enough for regex. – Kyle Strand Apr 26 '13 at 22:21
  • Absolutely you are right Kyle. I just wanted to replace < characters with < for posting an article to my blog. In this way I will have learned RegEx better. – uzay95 Apr 26 '13 at 22:31

3 Answers3

1

That would depend on the regex engine. If it supports Perl regex with negative lookahead assertions, you could do

<(?!(pre)|(\/pre))

or

<(?!\/?pre)

group1(?!group2) means "match group1 only if the following part doesn't match group2 (hence the name "negative lookahead". Here group1 is <, and group2 is \/?pre.

Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
  • Would you please express what your regex mean? Both of them are working but if you please explain why you used ? or ! it will be better to understand. – uzay95 Apr 26 '13 at 18:16
  • @uzay95 Take a look at the [link](http://www.regular-expressions.info/lookaround.html) in my answer, it explains it better than I could. I added a short explanation. Dukeling's answer is also essentially equivalent. – Lev Levitsky Apr 26 '13 at 18:24
0

[...] is for single character classes only.
So [^(pre|/pre)] means not any of these characters: (pre|/pre)
or, equivalently, none of these characters: )(|/pre
(since brackets and | don't mean anything in [] (as far as I remember)).

Then you said it should be optional with ?, so it doesn't exclude anything.

You probably want look-ahead:

\<(?!/?pre)

(?!...) is negative look-ahead, which "matches" if the next characters doesn't match the pattern given inside the brackets.

/?pre is a simplification of pre|/pre.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
0

Here's a lovely example of why RegEx isn't the best idea for HTML.

Anyway, if you choose to ignore, here is the expression you'll want to use (it will also handle whitespaces and capitalizations that are valid in HTML)

<(?!(\/?\s*[Pp][Rr][Ee]))
Community
  • 1
  • 1