I want to get datas from multipage with scrapy. And if I want to get the data from the second page, I should use cookies to pass the search term.(Because the search term won't emerge in the url)
The first page's url is :
http://epgd.biosino.org/EPGD/search/textsearch.jsp?textquery=man&submit=Feeling+Lucky
The second page's url is :
http://epgd.biosino.org/EPGD/search/textsearch.jsp?currentIndex=10
I have seen so many questions in stack overflow, they all know what the cookies is before they crawl the data.But I can only get the cookies when I finish crawling the first page. So I want to know how to do with this? Here is my code:
__author__ = 'Rabbit'
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy_Data.items import EPGD
class EPGD_spider(Spider):
name = "EPGD"
allowed_domains = ["epgd.biosino.org"]
stmp = []
term = "man"
url_base = "http://epgd.biosino.org/EPGD/search/textsearch.jsp?textquery=man&submit=Feeling+Lucky"
start_urls = stmp
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//tr[@class="odd"]|//tr[@class="even"]')
for site in sites:
item = EPGD()
item['genID'] = map(unicode.strip, site.xpath('td[1]/a/text()').extract())
item['taxID'] = map(unicode.strip, site.xpath('td[2]/a/text()').extract())
item['familyID'] = map(unicode.strip, site.xpath('td[3]/a/text()').extract())
item['chromosome'] = map(unicode.strip, site.xpath('td[4]/text()').extract())
item['symbol'] = map(unicode.strip, site.xpath('td[5]/text()').extract())
item['description'] = map(unicode.strip, site.xpath('td[6]/text()').extract())
yield item