im trying to write spider for viagogo. whenever im in this page (For example): http://www.viagogo.com/Concert-Tickets/Rock-and-Pop i dont see all the shows, and i need to click 'next' to get the other results. i opened wireshake and saw that this is a JSON, with {"method":"GetGridData"}, to the same address. im trying to get all the results by scrapy but i always get only the first results. this is my code:
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from viagogo.items import ViagogoItem
from scrapy.http import Request
class viagogoSpider(CrawlSpider):
name="viagogo"
allowed_domains=['viagogo.com']
start_urls = ["http://www.viagogo.com/Concert-Tickets"]
rules = (
# Running on each subject in title, such as Rock in music
Rule(SgmlLinkExtractor(restrict_xpaths=('//a[@class="t xs"]')), callback='Parse_Subject_Tickets', follow=True),
)
def Parse_Subject_Tickets(self, response):
item = ViagogoItem()
item["title"] = response.xpath('//title/text()').extract()
item["link"] = response.url
yield Request(response.url, callback =self.Parse_artists_Tickets, meta={"method":"GetGridData"}, dont_filter=True)
def Parse_artists_Tickets(self, response):
print response.body
in the rules im getting all the Concert-Tickets/XXXX pages, and in Parse_Subject_Tickets im trying to build the JSON but the after the print in Parse_artists_Tickets the pages is exactly the original, and not with the new artists...
any ideas?
thanks!