I am building a simple scraper with Scrapy but am having issues extracting certain parts of the data. The website contains about 20 of the following blocks of code:
<div class="row result">
<div class="updateCont date col-md-2 col-sm-2 col-xs-3">
<span>
<strong>Fri. 10 Feb</strong> <br />0:00 AM
</span>
</div>
<div class="updateCont eventIcon col-md-1 col-sm-1 col-xs-3">
<div class="icon ">
<i class="fa fa-update"></i>
</div>
</div>
<div class="updateCont event col-md-9 col-sm-8 col-xs-6">
<span>
The buyer has been notified of this update. <br />
<span class="inner department">
124
</span>
</span>
</div>
</div>
I have managed to extract each one of these with:
sel = Selector(text=response.body)
updates = sel.xpath("//div[@class='row result']")
I now would like to isolate the date and convert it into a datetime object as well as the updateCont event string. The buy has been notified of this update.
I tried:
for update in updates:
date = update.xpath('//span').extract()
print ( len(date) )
which results in 7. I was expecting it to be 3. More worringly, if I print out just date it prints out the same data three times. I was expecting three different lot of data as there are three separate in the html.
Is
sel = Selector(text=response.body)
updates = sel.xpath("//div[@class='row result']")
the correct code to isolate the sections? What would be the best approach to extract the spans?
0:00 AM` will only extract the 0:00AM and not the bit within the strong tag. – Feb 10 '17 at 17:24