Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
I am having trouble in parsing a tag using java.
Goal:
My goal is to parse complete div tag with all of its contents, even if it contains sub tags,
like from an HTML
<h2>some random text</h2>
<div id="outerDiv">
some text
<div>
some more text
</div>
last text
</div>
<div> some random div <b>bold</b></div>
i want to parse with all its inner contents upto its closing tags, that is:
<div id="outerDiv">
some text
<div>
some more text
</div>
last text
</div>
But what I currently gets, is either in this form or any other random format (dpending upon the changes I try with my expression :) ),
Please help me out to improve my Regex to parse a div with a specific id along with its contents perfectly.
Here is my expression (alot of brackets just to be on safer side :) ):
((<div.*(class=\"afs\")(.)*?>)((.)*?)(((<div(.)*?>)((.)*?)((</div>){1}))*?)((</div>){1}))
Here is my java code:
package rexp;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Rexp {
public static void main(String[] args) {
CharSequence inputStr = "asdasd<div class=\"af\">sasa<div><div><div class=\"afs\">as</div>qwessa</div></div></div>asd";
Pattern pattern = Pattern.compile("((<div.*(class=\"afs\")(.)*?>)((.)*?)(((<div(.)*?>)((.)*?)((</div>){1}))*?)((</div>){1}))");
Matcher matcher = null;
matcher = pattern.matcher(inputStr);
if (matcher.find()) {
System.out.println("Matched "+matcher.group(1));
} else {
System.out.println("Not Matched");
}
}
}