1
</?(?i:script|div|table|frameset|b|frame|iframe|style)(.|\n)*?>

I'm trying to filter certain HTML elements using this regex expression. I also want to filter "b" and "/b" as well but it doesn't seem to work for those.

Thank you.

Kevin
  • 3,690
  • 5
  • 35
  • 38
  • This may or may not apply here, but you should give this a read: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Brad Oct 24 '10 at 18:30
  • 1
    Do you use a specific programming language? – miku Oct 24 '10 at 18:30
  • Works for me, using Perl. You may want to try your regex in a test script, to see whether the problem is your regex or some other part of your code. – Jander Oct 24 '10 at 19:33
  • Also -- use an HTML parser, if possible. Doing this with regexes is very hard (impossible?) to get right. For example, I could sneak onto your page and write this: `< – Jander Oct 24 '10 at 21:33

1 Answers1

1

I would recommend you to instead of removing some specific tags, to allow some specific tags, and remove all other. That is, have a default-forbid policy.

aioobe
  • 413,195
  • 112
  • 811
  • 826
  • Can you tell me what that allow would look like? TY for your reply. – Kevin Oct 24 '10 at 18:39
  • Well, instead of removing the tags that match, remove all other tags. Btw, is this question intended for a specific programming language? (I could probably help you further if it's Java.) – aioobe Oct 24 '10 at 18:58
  • `(.|\n)` matches everything, including newlines. `[.\n]` matches only periods and newlines. I think the first was intended. – Jander Oct 24 '10 at 19:25