I'm having problems parsing table data with BeautifulSoup, though I've tried many solutions found here, here, and here. I hate to re-ask but maybe my issue is unique and that is why the above solutions haven't worked, or I'm just an idiot.
So I'm trying to retrieve the flood triggers for any given river from water.weather.gov. I'm using the Mississippi river data because it has the most active measuring stations. Each station has 4 stage triggers that I am trying to obtain: Action, Flood, Moderate, and Major. I have actually been able to extract the table data for those catagories when there are numerical values, however in cases where the table data is "Not Available" the row is skipped, so that when I put the values in the correct stage they are not aligned with the appropriate station trigger.
The table data that I'm trying to extract looks like this:
<div class="box_square"> <b><b>Flood Categories (in feet)</b><br>
</b>
<table width="150" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr><td nowrap="">Not Available</td></tr>
</tbody>
<div class="box_square"> <b><b>Flood Categories (in feet)</b><br>
</b>
<table width="150" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr style="display:'';line-height:20px;background-color:#CC33FF;color:black">
<td scope="col" nowrap="">Major Flood Stage:</td>
<td scope="col">18</td>
</tr>
<tr style="display:'';line-height:20px;background-color:#FF0000;color:white">
<td scope="col" nowrap="">Moderate Flood Stage:</td>
<td scope="col">15</td>
</tr>
<tr style="display:'';line-height:20px;background-color:#FF9900;color:black">
<td scope="col" nowrap="">Flood Stage:</td>
<td scope="col">13</td>
</tr>
<tr style="display:'';line-height:20px;background-color:#FFFF00;color:black">
<td scope="col" nowrap="">Action Stage:</td>
<td scope="col">12</td>
</tr>
<tr style="display:none;line-height:20px;background-color:#906320;color:white">
<td scope="col" nowrap="">Low Stage (in feet):</td>
<td scope="col">-9999</td>
</tr>
</tbody>
</table><br></div>
The last Low Stage isn't necessary and I have filtered it out. Here is the code that I have that will populate alert_list
with the appropriate values, but without the necessary Not Available:
alert_list = []
alert_values = []
alerts = soup.findAll('td', attrs={'scope':'col'})
for alert in alerts:
alert_list.append(alert.text.strip())
a_values = alert_list[1::2]
alert_list.clear()
major_lvl = a_values[::5]
moderate_lvl = a_values[1::5]
flood_lvl = a_values[2::5]
action_lvl = a_values[3::5]
and the results:
>>> major_lvl
['18', '26', '0', '11', '0', '17', '17', '18', '0', '683', '16', '0', '20', '16', '18', '665', '661', '18', '651', '645', '15.5', '636', '20', '631', '22', '21', '20.5', '21.5', '20', '20', '20.5', '13.5', '18', '18', '20', '18.5', '17', '14', '18', '19', '25', '25', '25', '26', '25', '24', '22', '25', '33', '34', '29', '34', '40', '40', '0', '0', '0', '42', '42', '0', '0', '0', '0', '0', '44', '47', '43', '35', '46', '52', '55', '0', '44', '57', '50', '57', '64', '40', '34', '26', '20']
I just noticed actually that the reason the Not Available tag isn't getting scraped is because it's under the tr tag, not td. How do I add this so that my values line up?