<b>Topic1</b><ul>asdasd</ul><br/><b>Topic2</b><ul>....
I want to extract everything that comes after <b>Topic1</b>
and the next <b>
starting tag. Which in this case would be: <ul>asdasd</ul><br/>
.
Problem: it must not necessairly be the <b>
tag, but could be any other repeating tag.
So my question is: how can I dynamically extract those text? The only static thinks are:
- The signal keyword to look for is always "Topic1". I'd like to take the surrounding tags as the one to look for.
- The tag is always repeated. In this case it's always
<b>
, it might as well be<i>
or<strong>
or<h1>
etc.
I know how to write the java code, but what would the regex be like?
String regex = ">Topic1<";
Matcher m = Pattern.compile(regex).matcher(text);
while (m.find()) {
for (int i = 1; i <= m.groupCount(); i++) {
System.out.println(m.group(i));
}
}