Again I seem to have a brick wall with this one and I'm hoping somebody would be able to answer it off the top of their head.
Here's an example code below:
def parse_page(self,response):
hxs = HtmlXPathSelector(response)
item = response.meta['item']
item["Details_H1"] = hxs.select('//*[@id="ctl09_p_ctl17_ctl04_ctl01_ctl00_dlProps"]/tr[1]/td[1]/text()').extract()
return item
It seems that the @id
in the Details_H1
could change. E.G. For a page it could be @id="ctl08_p_ctl17_ctl04_ctl01_ctl00_dlProps
and for the next page it's randomly @id="ctl09_p_ctl17_ctl04_ctl01_ctl00_dlProps
.
I would like to implement a do until
loop equivalent such that the code cycles through the numbers with increments of 1 until the value being yielded by the XPath is non-zero. So for example I could set i=108 and would i=i+1 each time until hxs.select('//*[@id="ctl09_p_ctl17_ctl04_ctl01_ctl00_dlProps"]/tr[1]/td[1]/text()').extract()
<> []
How would I be able to implement this?
Your help and contribution is greatly appreciated
EDIT 1
Fix addressed by TNT below. Code should read:
def parse_page(self,response):
hxs = HtmlXPathSelector(response)
item = response.meta['item']
item["Details_H1"] = hxs.select('//*[contains(@id, "_p_ctl17_ctl04_ctl01_ctl00_dlProps")]/tr[1]/td[1]/text()').extract()
return item