UPDATE: Found this post while throwing spaghetti at the walls and came up with this, which totally works in a loop. The csv isn't beautiful, but it can be adapted.
data = []
table = soup.find('table', border=6)
rows = table.findAll('tr')
for row in rows:
cols = row.findAll('td')
cells = [ele.text.strip() for ele in cols]
data = ([ele for ele in cells if ele]) # Get rid of empty values
#print data
record = (data)
writer = csv.writer(open('cpms10.csv', 'ab'))
writer.writerow(record)
I'm trying to scrape data from a series of pages like this one with BeautifulSoup. I want to get the data from the right side of each page in the proper order with the column headings starting with the year.
I've been using something like this, but it doesn't get the actual year because there's a padding in the first row and it stops after the first section; when i want to get all four:
table = soup.find('table', border=6)
data = {}
for row in table.findAll('tr')[2:]:
cells = row.findAll('td')
key = cells[0].text.strip()
value = cells[1].text.strip()
data[key] = value
record = (key, value)
writer = csv.writer(open('cpms.csv', 'ab'))
writer.writerow(record)
I've tried adding , {'height' : '19} and , 'font' after the findAll('td') to narrow down the selection but that doesn't work.
This is the HTML for the first section of the table, tho if you look at the whole page, there's an earlier table and td that never close out until the end of the document.
Any ideas/assistance greatly, greatly appreciated!
<table width=845 border=6 cellpadding=0 cellspacing=0 bgcolor=#c0c0c0>
<tr><td height=28 width=19 valign=top bgcolor=#336699> </td>
<td valign=middle colspan=3 bgcolor=#336699><font color=white size=3><b>10-1101 - Department of Transportation - Dept Code:A101101 - Class Code:01101</b></font></td></tr>
<tr><td colspan=4 height=18 valign=top> </td></tr>
<tr><td valign=top rowspan=54> </td>
<td valign=top height=18 width=255 align=left bgcolor=#FFFFB4><font size=2><b>Year</b></td>
<td rowspan=54 width=5 valign=top><img src='images/spacer.gif' width=10></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2>2010</td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Appropriation Title</b></td>
<td valign=top BGCOLOR=#FFFFB4><font size=2>DONA ANA CO EAST MESA AREA ROADS & DRAINAGE</td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Fund Code</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2>SEVERANCE TAX BONDS</td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>EO 2013-006 Eligibility</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2></td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Bond Sale Date</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2>***</td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Bond Series Number</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2></td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Amount of Bond Sale</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2>$0 </td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Category</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2></td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Subcategory</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2></td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>County</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2>Dona Ana</td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>State Amount</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2>$135,000</td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Chapter/Section</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2>105 / 18</td></tr>
<tr><td height=18 valign=top align=left bgcolor=#FFFFB4><font size=2><b>Reversion Date</b></td>
<td valign=top align='right' style='{padding-right:150px}' BGCOLOR=#FFFFB4><font size=2>6/30/2014</td></tr>
<tr><TH COLSPAN=2>SHARE/BOF Data</TH> <td height=12</td></tr>