6

I have a java string such as this:

String string = "I <strong>really</strong> want to get rid of the strong-tags!";

And I want to remove the tags. I have some other strings where the tags are way longer, so I'd like to find a way to remove everything between "<>" characters, including those characters.

One way would be to use the built-in string method that compares the string to a regEx, but I have no idea how to write those.

Rickard
  • 1,289
  • 1
  • 17
  • 27

3 Answers3

22

Caution is advised when using regex to parse HTML (due its allowable complexity), however for "simple" HTML, and simple text (text without literal < or > in it) this will work:

String stripped = html.replaceAll("<.*?>", "");
Bohemian
  • 412,405
  • 93
  • 575
  • 722
4

To avoid Regex:

String toRemove = StringUtils.substringBetween(string, "<", ">");
String result = StringUtils.remove(string, "<" + toRemove + ">"); 

For multiple instances:

String[] allToRemove = StringUtils.substringsBetween(string, "<", ">");
String result = string;
for (String toRemove : allToRemove) {
  result = StringUtils.remove(result, "<" + toRemove + ">"); 
}

Apache StringUtils functions are null-, empty-, and no match- safe

Gibolt
  • 42,564
  • 15
  • 187
  • 127
0

You should use

String stripped = html.replaceAll("<[^>]*>", "");
String stripped = html.replaceAll("<[^<>]*>", "");

where <[^>]*> matches substrings starting with <, then zero or more chars other than > (or the chars other than < and > if you choose the second version) and then a > char.

Note that <.*?>

See the regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563