I have some scraped content I got from with urllib.request.urlopen(url) as response:
and I'm trying to run regex on it to extract some information in a <td>...</td>
. But I can't get the regex to look further; I think the document has new lines that are getting in the way. I've tried adding \s
or \r
but it isn't working for me.
I'm trying to retrieve
The content was pretty nice and would participate again
using the regex:
(?<=showPollResponses\()(.*)(?=)
and here is a sample of the document:
</thead>
<tr>
<td class="oddpoll" style="width:20%"><b><a href="#" onclick="showPollResponses(123456, 99, '1A2B3C4D5E6F7G8H9I0J1K2L3M4N5O6P', 123456, 123456, 99);return false;">The stuf (i</a></b>
<br>
</td><td class="oddpoll" style="width:35%">The content was pretty nice and would participate again </td><td class="oddpoll" style="width:45%"><b>123 Total</b>
<br>
</td>
</tr>
<tr>
<td class="oddpoll"> </td>
I've tried using (?<=showPollResponses\()(.*)(?=width:45%)
but it's not returning anything. I was going to take that chunk of html and regex it further to extract the final text.
Here's my regex101.com
There's not a more simple way to do this, is there? In PHP I've used tools to scrape data with css selectors, so I could easily retrieve this that way. Or in the urllib
context, is using regex the only way? Thanks for any help provided.