I'm in a position where I need to extract content from an existing site. The HTML is brutal but so far I've been able to pull the existing content into tables except for this bit of text.
I've scoured around here with no avail. Here's what a bit of the markup looks like:
<div id="content">
<div class="comments">
My comment<br />
Name <br />
Mytown, NY USA
</div>
- Wednesday, December 07, 2005 at 07:20:47 (EST)
<hr />
<div class="comments">
My Comment 2<br />
2nd Person's name <br />
My Town, USA
</div>
- Wednesday, November 02, 2005 at 18:48:38 (EST)
<hr />
</div>
I have to parse through tons of entries like these. I have all the other ones, but how do I target the text in each instance that's immediately outside of the </div>
And complete when it hits that <hr />
?