0

I'm trying to figure out the best way to click next page button in www.booking.com hotel list and to continue spider running.

When inspected button:

<li class="nextpage"
   a href="/bigcity/offset=15"class=gotopage_2"
</li>

working code for single page:

import scrapy
from ..items import BookItem 

class BookSpiderSpider(scrapy.Spider):
    name = "book_spider"
    start_urls = (
        'https://www.booking.com/smallcity/offset=10',
    )

    def parse(self, response) :
        items = BookItem()

        title_name = response.css('span.sr-hotel__name::text').extract()

        items['title_name'] = title_name

        yield items

a href and class everytime changes when button is being clicked

So I'm guessing python code should find the button then take different href replace it with existing url and go there

Tom Sada
  • 11
  • 2

2 Answers2

0

Hi use this snippet for the you application

next_page = response.xpath('//a[contains(@class,"ficon-caret-right")]/@href').extract()

        if len(next_page) !=0:
            next_href = next_page[0]
            next_page_url = next_href
            print "==============> next cat pagination url :", next_page_url
            yield scrapy.Request(next_page_url, callback=self.parse)
0

User .urljoin, in order to avoid any URL schema issues:

next_page_url = response.urljoin( next_href )
Janib Soomro
  • 446
  • 6
  • 12