Hello! I am running into some issues, I was trying to figure out how to set a start url to a specific parse_item method inside the crawlspider class.
Let's say I have more than one start url, two for the sake of simplicity.
So: start_urls = ["www.website1.com","www.website2.com"]
Now let's say I have two parse functions named parse_item1 and parse_item2.
I already set parse_item1 to callback on parse_item2 and vica versa.
So they do run in order of one another.
Now I am having some problems I want to go through each start_url one after the other.
So as followed: example1,example2,example1,example2. Not: example1,example1,example2,example2,example2,example1.
I thought I'd use two parse_item functions to do so BUT now I have a problem.
Even though they still call each other in order they tend to not call each start url in order.
So my question is, is it possible and if it is how can I bind for example www.example1.com to parse_item1 and www.example2.com to parse_item2 so they get called one after the other.
class juggler(CrawlSpider):
name = "juggle"
allowed_domains = ["example1.com","example2.com"]
start_urls = ["http://www.example1.com/","http://www.example2.com/"]
rules = [
Rule(LinkExtractor(),callback="parse_all",follow=False)
]
def parse_all(self,response):
yield self.parse_item1(response)
yield self.parse_item2(response)
def parse_item1(self,response):
time.sleep(1)
item = TwolaircrawlerItem()
print "Item 1!"
link = response.url
print link
return Request(url=link,callback="self.parse_item2")
def parse_item2(self,response):
time.sleep(1)
item = TwolaircrawlerItem()
print "Item 2!"
link = response.url
print link
return Request(url=link,callback="self.parse_item1")