I have a file with several instances of rows with this structure:
<tr>
<td style="width:25%;">
<span class="results_title_text">DUNS:</span> <span class="results_body_text"> 012361296</span>
</td>
<td style="width:25%;">
</td>
<!-- label as CAGE when US Territory is listed as Country -->
<td style="width:27%;">
<span class="results_title_text">CAGE Code:</span> <span class="results_body_text">HELLO</span>
</td>
<td style="width:15%" rowspan="2">
<input type="button" value="View Details" title="View Details for Rascal X-Press, Inc." class="center" style="height:25px; width:90px; vertical-align:middle; margin:7px 3px 7px 3px;" onClick="viewEntry('4420848', '1472652382619')" />
</td>
</tr>
I want to select only those <span class="results_body_text">
that are preceeded by <span class="results_title_text">DUNS:</span>
so in this case I would only return the span that contains 012361296
and not the one that contains HELLO
How can I do this using a regular expression or anything else? I have tried the "starts with" regex format, but I am failing to see what string I would be parsing in that case. I eventually want to parse the regex into a re.compile()
compile function in python.