I need to parse html table of the following structure:
<table class="table1" width="620" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr width="620">
<th width="620">Smth1</th>
...
</tr>
<tr bgcolor="ffffff" width="620">
<td width="620">Smth2</td>
...
</tr>
<tr bgcolor="E4E4E4" width="620">
<td width="620">Smth3</td>
...
</tr>
<tr bgcolor="ffffff" width="620">
<td width="620">Smth4</td>
...
</tr>
</tbody>
</table>
Python code:
r = requests.post(url,data)
html = lxml.html.document_fromstring(r.text)
rows = html.xpath(xpath1)[0].findall("tr")
#Getting Xpath with FireBug
data = list()
for row in rows:
data.append([c.text for c in row.getchildren()])
But I get this on the third line:
IndexError: list index out of range
The task is to form python dict from this. Number of rows could be different.
UPD. Changed the way I'm getting html code to avoid possible problems with requests lib. Now it's a simple url:
html = lxml.html.parse(test_url)
This proves everyting is Ok with html:
lxml.html.open_in_browser(html)
But still the same problem:
rows = html.xpath(xpath1)[0].findall('tr')
data = list()
for row in rows:
data.append([c.text for c in row.getchildren()])
Here is the xpath1:
'/html/body/table/tbody/tr[5]/td/table/tbody/tr/td[2]/table/tbody/tr/td/center/table'
UPD2. It was found experimentally, that xpath crashes on:
xpath1 = '/html/body/table/tbody'
print html.xpath(xpath1)
#print returns []
If xpath1 is shorter, then it seeem to work well and returns [<Element table at 0x2cbadb0>]
for xpath1 = '/html/body/table'