The website that I am crawling contains many players and when I click on any player, I can go the his page.
The website structure is like this:
<main page>
<link to player 1>
<link to player 2>
<link to player 3>
..
..
..
<link to payer n>
</main page>
And when I click on any link, I go to player's page which is like this:
<player name>
<player team>
<player age>
<player salary>
<player date>
I want to scrap all the players those age is between 20 and 25 years.
what I am doing
scraping the main page using first spider.
getting links using first spider.
crawl each link using second spider.
get the player informatoin using second spider.
save this information in json file using pipeline.
my question
how can I return the date
value from second spider
to the first spider
what i have tried
I build my own middelware and i override the process_spider_output
. it allows me to print the request but I don't know what else should I do in order to return that date
value to my first spider
any help is appreciated
Edit
Here is some of the code:
def parse(self, response):
sel = Selector(response)
Container = sel.css('div[MyDiv]')
for player in Container:
extract LINK and TITLE
yield Request(LINK, meta={'Title': Title}, callback = self.parsePlayer)
def parsePlayer(self,response):
player = new PlayerItem();
extract DATE
return player