I have been trying to parse text elements stored in between <td>
tags, for example:
<tr>
<td>Trading Hours</td>
<td><b>Monday</b> <br />
London - 23:00 Sunday - 23:00 Monday<br />
New York - 18:00 Sunday - 18:00 Monday<br />
Chicago - 17:00 Sunday - 17:00 Monday<br />
<br />
<b>Tuesday-Friday</b> <br />
London - 01:00 - 23:00<br />
New York - 20:00 - 18:00<br />
Chicago - 19:00 - 17:00<br />
</td>
</tr>
In this simple example, there only 2 <td>
tags and suppose a variable tr
stores entire block of html code. My logic for extracting text is as follow (without any <tr>
or <br>
tags):
for td in tr.findAll('td'):
row.append((td.find('td', text = True)).strip().strip('\n'))
Problem: My for
loop recognizes the first <td>
tag, but not the second. How can I improve this?