i want to select the following strings from this html using just lxml and some clever xpath. The strings will change but the surrounding html will not.
i need...
19/11/2010
AAAAAA/01
Normal
United Kingdom
This description may contains <bold>html</bold> but i still need all of it!
from...
...
<p>
<strong>Date:</strong> 19/11/2010<br>
<strong>Ref:</strong> AAAAAA/01<br>
<b>Type:</b> Normal<br>
<b>Country:</b> United Kingdom<br>
</p>
<hr>
<p>
<br>
<b>1. Title:</b> The Title<br>
<b>2. Description: </b> This description may contains <bold>html</bold> but i still need all of it!<br>
<b>3. Date:</b> 25th October<br>
...
</p>
...
So far i've only come up with using regex expressions and re:match
to try and drag it out, but even that won't work without something which enables me to get innerHTML of a the <p>
nodes for exapmle.
is there any way to do this without post-processing the string through regex?
Thanks :)