Removing a substring between two characters (java)

Question

I have a java string such as this:

String string = "I <strong>really</strong> want to get rid of the strong-tags!";

And I want to remove the tags. I have some other strings where the tags are way longer, so I'd like to find a way to remove everything between "<>" characters, including those characters.

One way would be to use the built-in string method that compares the string to a regEx, but I have no idea how to write those.

Bohemian · Accepted Answer · 2012-05-05T13:31:53.657

22

Caution is advised when using regex to parse HTML (due its allowable complexity), however for "simple" HTML, and simple text (text without literal < or > in it) this will work:

String stripped = html.replaceAll("<.*?>", "");

edited May 05 '12 at 13:31

answered May 05 '12 at 13:16

Bohemian

412,405
93
575
722

Gibolt · Answer 2 · 2019-07-19T21:09:01.830

To avoid Regex:

String toRemove = StringUtils.substringBetween(string, "<", ">");
String result = StringUtils.remove(string, "<" + toRemove + ">");

For multiple instances:

String[] allToRemove = StringUtils.substringsBetween(string, "<", ">");
String result = string;
for (String toRemove : allToRemove) {
  result = StringUtils.remove(result, "<" + toRemove + ">"); 
}

Apache StringUtils functions are null-, empty-, and no match- safe

score 0 · Answer 3 · answered Oct 30 '21 at 21:15

You should use

String stripped = html.replaceAll("<[^>]*>", "");
String stripped = html.replaceAll("<[^<>]*>", "");

where <[^>]*> matches substrings starting with <, then zero or more chars other than > (or the chars other than < and > if you choose the second version) and then a > char.

Note that <.*?>

is less efficient than a negated character class (see Which would be better non-greedy regex or negated character class?)
does not find substrings spanning across multiple lines (see How do I match any character across multiple lines in a regular expression?), but it can be solved with (?s)<.*?>, <(?s:.)*?>, <[\w\W]*?>, and many other not-so-efficient variations.

See the regex demo.

Removing a substring between two characters (java)

3 Answers3

Linked

Related