27

How would I make a regular expression to match the character < not followed by (a or em or strong)

So <hello and <string would match, but <strong wouldn't.

zb226
  • 9,586
  • 6
  • 49
  • 79
Kyle
  • 21,377
  • 37
  • 113
  • 200
  • **See Also**: [A regex to match a substring that isn't followed by a certain other substring](https://stackoverflow.com/q/2631010/1366033) – KyleMit Dec 21 '21 at 13:29

5 Answers5

52

Try this:

<(?!a|em|strong)
Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
  • +1 I think that does it for Perl-compatible regexp syntax. (For other syntaxes, it might be different) – David Z Apr 25 '10 at 01:00
  • 3
    Just in case someone is interested, `?!` initiates a negative lookahead. I found a good overview of lookarounds here: http://www.rexegg.com/regex-lookarounds.html – schnatterer Aug 18 '14 at 20:11
  • For a full function: `myString.replace(/<(?!\/?(a|em|strong)).*?>/g, '');` I also added in `\/?` to check for closing tags – SwiftNinjaPro Jan 09 '20 at 18:21
11

You use a negative lookahead, the simplest form for which is (for this problem):

<(?!a|em|strong)

The one issue with that is that it will ignore <applet>. A way to deal with that is by using \b, which is a zero-width expression (meaning it captures none of the input) that matches a word to non-word or non-word to word transition. Word characters are [0-9a-zA-Z_]. So:

<(?!(a|em|strong)\b)
700 Software
  • 85,281
  • 83
  • 234
  • 341
cletus
  • 616,129
  • 168
  • 910
  • 942
3

Although Andrew's answer is clearly superior, before, I also got it to work with [^(?:a|em|strong)].

WoodrowShigeru
  • 1,418
  • 1
  • 18
  • 25
2

If your regex engine supports it, use a negative lookahead assertion: this looks ahead in the string, and succeeds if it wouldn't match; however, it doesn't consume any input. Thus, you want /<(?!(?:a|em|strong)\b)/: match a <, then succeed if there isn't an a, em, or strong followed by a word break, \b.

Antal Spector-Zabusky
  • 36,191
  • 7
  • 77
  • 140
0
function strip_tags(str, keep){
    if(keep && Array.isArray(keep)){keep = '|'+keep.join('|');}else if(keep){keep = '|'+keep;}else{keep = '';}
    return str.replace(new RegExp('<\/?(?![^A-Za-z0-9_\-]'+keep+').*?>', 'g'), '');
}

usage:

strip_tags('<html><a href="a">a</a> <strong>strong text</strong> and <em>italic text</em></html>', ['strong', 'em']);
//output: a <strong>strong text</strong> and <em>italic text</em>

I would also recommend you strip parameters from the tags you keep

function strip_params(str){
    return str.replace(/<((?:[A-Za-z0-9_\-])).*?>/g, '<$1>');
}
SwiftNinjaPro
  • 787
  • 8
  • 17