I'm working with a small subset of mostly invalid HTML, and I need to extract a small piece of data. Given the fact that most of "markup" isn't valid, I don't think that loading everything into a DOM is a good option. Moreover, it seems like a lot of overhead for this simple case.
Here's an example of the markup that I have:
(a bunch of invalid markup here with unclosed tags, etc.)
<TD><span>Something (random text here)</span></TD>
(a bunch more invalid markup here with more unclosed tags.)
The <TD><span>Something (random text here)</span></TD>
portion does not repeat itself anywhere in the document, so I believe a simple regex would do the trick.
However, I'm terrible with regular expressions.
Should I use a regular expression? Is there a more simple way to do this? If possible, I'd just like to extract the text after Something, the (random text here) portion.
Thanks in advance!
Edit -
Exact example of the HTML (I've omitted the stuff prior, which is the invalid markup that the vendor uses. It's irrelevant for this example, I believe):
<div class="FormTable">
<TABLE>
<TR>
<TD colspan="2">In order to proceed with login operation please
answer on the security question below</TD>
</TR>
<TR>
<TD colspan="2"> </TD>
</TR>
<TR>
<TD><label class="FormLabel">Security Question</label></TD>
<TD><span>What is your city of birth?</span></TD>
</TR>
<TR>
<TD><label class="FormLabel">Answer</label></TD>
<TD><INPUT name="securityAnswer" class="input" type="password" value=""></TD>
</TR>
</TABLE>
</div>