scrapy - trying to get "next" url

Question

I am using scrapy and trying to come up with a restrict_xpaths rule so the crawler will always go only to the next image.

I start with this image: https://www.flickr.com/photos/safaripartners/4838428819/in/photolist-qtC2e5-5iA4ZQ-8nydjx-zf1rvk-wvDaHE-8nBnhu-baArRv-36WzbG-2hLUaa-v6Mw1k-d33z5A-8nBniU-6jTfkT-6W6Sbu-5CtFsA-6RZZ5K-36WYuS-5DatmT-d5Qo1A-nMktKL-9wF1aF-hfuXhF-eLaQn5-5tR4Ri-prLcsi

and my goal is to continuously scrape the next one.

I tried:

name = "FlickerSpider"
allowed_domains = ["flickr.com"]
start_urls = [
"https://www.flickr.com/photos/indymcduff/6632326011/in/photolist-9uQnYG-9SnqTY-qjXTHY-onEUN5-5d72ri-tgMKAY-8qaRQL-on6ZLu-bnMg2B-8AVUgV-b75pst/"
]
rules = (
    #crawl to next image
    Rule(SgmlLinkExtractor(allow=(r'photos'),restrict_xpaths=('//class[@data="navigate-target navigate-next")]')) ,callback='parse_item', follow=True),
)

but I don't get any requests. Anyone has a suggestion of what rule should I be using? Thanks!

score 0 · Answer 1 · edited May 23 '17 at 11:59

You have a syntax error in your restrict_xpaths expression. Try

restrict_xpaths=('//a[@class="navigate-target navigate-next"]')

It's always worth trying out your xpath using scrapy shell or using $x in Firebug for firefox. When there's an xpath problem like this in a rule it just silently fails.

Update

I should have given you this xpath

restrict_xpaths=('//a[@class="navigate-target navigate-next"]/@href')

which works fine using the $x command in Firebug. As you said, it doesn't work with scrapy shell. It looks like that part of the page isn't part of the plain HTML but is generated at run-time. Unless you can find an alternative URL you might have to use something like Selenium which renders the page in a standard browser including dynamic content. Scrapy can then parse that HTML including the link you're after. Have a look at this question.

You are right but it's only because I tried some things out. Anyway- it still doesn't yield any request.. couldn't get it using scrapy shell as well. — Tom, Nov 19 '15 at 18:42

scrapy - trying to get "next" url

1 Answers1