I have a question here. If I have a html file here.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title> New Document </title>
<meta name="Generator" content="EditPlus">
<meta name="Author" content="">
<meta name="Keywords" content="">
<meta name="Description" content="">
</head>
<body>
<h1>Welcome to My Homepage</h1>
<p class="intro">My name is Donald.</p>
<h1 class="intro"><p class="important">Note that this is an important paragraph.</p>
</h1>
<div class="intro important"><p class="apple">I live in apple.</p></div>
<div class="intro important">I like apple.</p></div>
<p>I live in Duckburg.</p>
</body>
</html>
Right now I want to get html element by class name. If the class name is ".intro", it should return:
My name is Donald.
<p class="important">Note that this is an important paragraph.</p>
If the class name is ".intro.important" it should return:
Note that this is an important paragraph.
If the class name is ".intro.important>.apple", it should return:
I live in apple.
I know jquery has class selector this function, but now I want to implement this function. Can I use java regexp to do this? It seems like that the class name is single string is ok. But if the class name has a child class name, it will make it hard. One more question, can java get the dom structure of the html?