Here's some lines of the document:
<div class="rowleft">
<h3>Technical Fouls</h3>
<table class="num-left">
<tr class="datahl2b">
<td> </td>
<td>Players</td>
</tr>
<tr>
<td>DAL</td>
<td>
None</td>
</tr>
<tr>
<td>MIA</td>
<td>
Mike Miller</td>
<td>
Mike Miller, Jr.</td>
</tr>
</table>
</div>
I'm interested in extracting the None
and Mike Miller
and Mike Miller, Jr.
from this. I tried using various XML parsers, but 1) the performance is abysmal and 2) the document is apparently not a properly formatted XML document.
One thing I've been thinking about is stripping the document of newlines, splitting it at something like <tr>
, seeing which lines contain data (probably using StartsWith()
), and extracting it with a regex. That would be efficient enough for my program (doesn't really matter that it takes half a second when downloading the document is five seconds), but I'm interested it alternative solutions.