I want to select every <
characters but <pre
or </pre
in this document with Regexp
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
<pre>
if a < b
</pre>
</body>
</html>
I want to select every <
characters but <pre
or </pre
in this document with Regexp
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
<pre>
if a < b
</pre>
</body>
</html>
That would depend on the regex engine. If it supports Perl regex with negative lookahead assertions, you could do
<(?!(pre)|(\/pre))
or
<(?!\/?pre)
group1(?!group2)
means "match group1
only if the following part doesn't match group2
(hence the name "negative lookahead". Here group1
is <
, and group2
is \/?pre
.
[...]
is for single character classes only.
So [^(pre|/pre)]
means not any of these characters: (pre|/pre)
or, equivalently, none of these characters: )(|/pre
(since brackets and |
don't mean anything in []
(as far as I remember)).
Then you said it should be optional with ?
, so it doesn't exclude anything.
You probably want look-ahead:
\<(?!/?pre)
(?!...)
is negative look-ahead, which "matches" if the next characters doesn't match the pattern given inside the brackets.
/?pre
is a simplification of pre|/pre
.
Here's a lovely example of why RegEx isn't the best idea for HTML.
Anyway, if you choose to ignore, here is the expression you'll want to use (it will also handle whitespaces and capitalizations that are valid in HTML)
<(?!(\/?\s*[Pp][Rr][Ee]))