python recursive scrapy not all pages

Question

im trying to write spider for viagogo. whenever im in this page (For example): http://www.viagogo.com/Concert-Tickets/Rock-and-Pop i dont see all the shows, and i need to click 'next' to get the other results. i opened wireshake and saw that this is a JSON, with {"method":"GetGridData"}, to the same address. im trying to get all the results by scrapy but i always get only the first results. this is my code:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from viagogo.items import ViagogoItem
from scrapy.http import Request

class viagogoSpider(CrawlSpider):
    name="viagogo"
    allowed_domains=['viagogo.com']
    start_urls = ["http://www.viagogo.com/Concert-Tickets"]
    rules = (
        # Running on each subject in title, such as Rock in music
        Rule(SgmlLinkExtractor(restrict_xpaths=('//a[@class="t xs"]')), callback='Parse_Subject_Tickets', follow=True),
    )

    def Parse_Subject_Tickets(self, response):
        item = ViagogoItem()
        item["title"] = response.xpath('//title/text()').extract()
        item["link"] = response.url
        yield Request(response.url,  callback =self.Parse_artists_Tickets, meta={"method":"GetGridData"}, dont_filter=True)

    def Parse_artists_Tickets(self, response):
        print response.body

in the rules im getting all the Concert-Tickets/XXXX pages, and in Parse_Subject_Tickets im trying to build the JSON but the after the print in Parse_artists_Tickets the pages is exactly the original, and not with the new artists...

any ideas?

thanks!

As I answered in your other question, please see: http://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax — Elias Dorneles, Dec 14 '14 at 20:30
thanks man, i saw the post and i did exactly what it says, and it just not work! did you try it? im not getting the JSON — SomeNiceGuy21, Dec 15 '14 at 06:51
You are trying to put the POST data in the `meta` field, which has absolutely nothing to do with that. Read the official documentation first, and then ask informed questions. You will find most information about `meta` and `POST` here: http://doc.scrapy.org/en/latest/topics/request-response.html — bosnjak, Dec 15 '14 at 07:48
@Lawrence, thanks! i read it.. and i followed this guide, and change the code to this: 'yield FormRequest(response.url, callback=self.Parse_artists_Tickets, formdata={"method":"GetGridData"})' but it still doesnt work.. what am i doing wrong? — SomeNiceGuy21, Dec 15 '14 at 20:23
That single change is not enough. Check for other things that should be set for the request to be parsed properly. Check to see if the `headers` have some specific info that you could mimic. — bosnjak, Dec 15 '14 at 20:25
when i checked it in Firebug, there are many header fields, such as ADRUM, Content-Type, Cookie... do i need to add all that in my request? — SomeNiceGuy21, Dec 15 '14 at 20:46
I would suggest adding everything to replicate the exact request. Then if it works, you can trim things to remove everything that is not necessary. — bosnjak, Dec 17 '14 at 18:52

python recursive scrapy not all pages

0 Answers0