0

I'd like to remove some tags and the content of those tags in a html string like this:

val htmlString = "<html><b>test,test</b></html>"
val strippedStr = htmlString.replaceAll("<b[^>]*>[a-z]*<//b>", "")

But it seems to leave the string unchanged.

Any idea what I'm doing wrong in particular ? (Maybe wrong escaping?)

Thanks in advance

Stefan Kunze
  • 741
  • 6
  • 15

2 Answers2

1

If performance is not an issue, you can use a lazy quantifier to match everything up to </b>. The extra // is unnecessary as well.

<b[^>]*>.*?</b>

REY

Your Code

val htmlString = "<html><b>test,test</b></html>"
val strippedStr = htmlString.replaceAll("<b[^>]*>.*?</b>", "")
Daniel Gimenez
  • 18,530
  • 3
  • 50
  • 70
0

The escape character is \, not /, and / does not need to be escaped in first place. This is not matching because there's no <//b> in the input.

Aside from that... do not use regex to manipulate HTML. Use a proper HTML parser, with an HTML sanitizer to pre-process the input.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
  • 1
    Indeed - see [the most renowned answer on SO](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Shadowlands Sep 09 '13 at 14:08