I got this HTML (simplified):
<td class="pad10">
<div class="button-left" style="margin-bottom: 4px">04.09.2013</div>
<table width="100%" class="record generic schedule margin-4" </table>
<table width="100%" class="record generic schedule margin-4" </table>
<div class="button-left" style="margin-bottom: 4px">05.10.2013</div>
<table width="100%" class="record generic schedule margin-4" </table>
<table width="100%" class="record generic schedule margin-4" </table>
<table width="100%" class="record generic schedule margin-4" </table>
<table width="100%" class="record generic schedule margin-4" </table>
</td>
I want to get dict structure which contains (row means table content separated by dates in main table):
{'04.09.2013': [1 row, 2 row],
'05.10.2013': [1 row, 2 row, 3 row, 4 row]}
I can extract all 'div' with:
dt = s.xpath('//div[contains(@class, "button-left")]')
I can extract all 'table' with:
tables = s.xpath('//table[contains(@class, "record generic schedule margin-4")]')
But I don't know how to link 'dt' with corresponding 'tables' in Scrapy parser. It's possible to create a condition on scraping process, like this: if you found 'div' then you extract all next 'table' till you found other 'div'?
With Chrome i get two xPath examples of these elements:
//[@id="wrap"]/table/tbody/tr/td/table[3]/tbody/tr/td/div[2]
//[@id="wrap"]/table/tbody/tr/td/table[3]/tbody/tr/td/table[1]
Maybe it will help to image full structure of table.
Solution (thanks to @marven):
s = Selector(response)
table = {}
current_key = None
for e in s.xpath('//td[@class="pad10"]/*') :
if bool(int(e.xpath('@class="button-left"').extract()[0])):
current_key = e.xpath('text()').extract()[0]
else:
if bool(int(e.xpath('@class="record generic schedule margin-4"').extract()[0])):
t = e.extract()
if current_key in table:
table[current_key].append(t)
else:
table[current_key] = [t]
else:
pass