I've got string <strong>Foo</strong>
. I want to delete HTML tags from this string even with it's content. In this example expression must return ""
(empty string). How should I do this?
Asked
Active
Viewed 96 times
-1

Richard Sitze
- 8,262
- 3
- 36
- 48

Tony
- 3,605
- 14
- 52
- 84
2 Answers
2
If the html you're trying to remove wouldn't have any nested html tags; here's a simple regex based solution. You can assign tag name to tag
for convenience and the regex would adjust accordingly.
String tag = "strong";
String str = "This is <strong>Foo</strong>Bar.";
String regex = "<\\s*" + tag + "[^>]*>[^<]*</\\s*" + tag + "\\s*>";
System.out.println(str.replaceAll(regex, "")); // This is Bar.
The regex accommodates for any extra tag attributes like <strong class="bold">
etc. but could break if and is updated to take care of slightly ill-formatted html like unnecessary white spaces or new lines here and there.

Ravi K Thapliyal
- 51,095
- 9
- 76
- 89
0
Since you are claiming that you don't have nested tags you can try using "<([^>]+)>.*?</\\1>
String data = "bar<strong>foo</strong>yyy<strong>zzz</strong>";
System.out.println(data.replaceAll("<([^>]+)>.*?</\\1>", ""));
ouptut
baryyy

Pshemo
- 122,468
- 25
- 185
- 269
foo
" in it. If there cannot be nested html in your data then you could do it trivially with a regular expression – Cruncher Aug 25 '13 at 23:02