I've read a lot of questions on stackoverflow regarding html parsing. I've learned that, when possible, we should avoid regex and use a parser instead. I know that there are a lot of Html/Xml parser but I don't know how to use them properly.
Consider this html, parsed through jTidy. I've got a Document object created by jTidy of this code:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<!-- Header content -->
</head>
<body>
<div id="container">
<div id="id1"> ... </div>
<div id="id2"> ... </div>
<div id="mainContent">
<div id="section 1">
<div id="subSection">
<!-- Interested part -->
<tbody>
<tr class="success">
<td class="fileName"><span>File One</span></td>
</tr>
<tr class="fail">
<td class="fileName"><span>File Two</span></td>
</tr>
<tr class="success">
<td class="fileName"><span>File Three</span></td>
</tr>
</tbody>
</div>
</div>
</div>
</div>
</body>
Now, I would like to map (in a Map :D ) each filename with its class (success/fail). I can do it with DOM, but I should create a NodeList and for each Element create a new nodelist (lots of memory and boring). There are alternatives like Sax, Xerces etc etc. but I don't know advantages/disadvantages of them.
What is the simplest (and fastest) way to extract those information from the "jTyded" html above?