By putting \w+[^href]
you still allow things like <a href ="...
and can exclude tags ending in h
, r
, e
, or f
(that aren't necessarily href
).
Try
\s+(?!href)[a-zA-Z+]+ *= *(?:"[^"]+"|\w+)
Explanation: The (?!href)
is a negative lookahead and prevents the tag from being href
.
The [a-zA-Z]+
is your tag. There are spaces allowed before and after the '='. I restricted to letters, because I'm pretty sure attribute names can't include numbers or underscores (which \w
will allow).
The (?:"[^"]+"|\w+)
means that the value of the tag can be anything within double-quotes, OR a non-quoted set of \w+
.
These all prevent the match from going outside the >
, unless your regex is malformed and you have (e.g.) <a name="asdf>
(note the missing closing "
).