1

I want to scrape multiple pages but when I move to other pages the URL remain same who I scrape the pages multiple pages if there is any solution provide me link of the page is https://www.ifep.ro/justice/lawyers/lawyerspanel.aspx

import scrapy
from scrapy.http import Request



class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://www.ifep.ro/justice/lawyers/lawyerspanel.aspx']
    custom_settings = {
        'CONCURRENT_REQUESTS_PER_DOMAIN': 1,
        'DOWNLOAD_DELAY': 1,
        'USER_AGENT': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'
        }
    
    
    
    def parse(self, response):
        books = response.xpath("//div[@class='list-group']//@href").extract()
        for book in books:
            url = response.urljoin(book)
            if url.endswith('.ro') or url.endswith('.ro/'):
                continue
            yield Request(url, callback=self.parse_book)
    
    def parse_book(self, response):
        title=response.xpath("//span[@id='HeadingContent_lblTitle']//text()").get()
        d1=response.xpath("//div[@class='col-md-10']//p[1]//text()").get()
        d1=d1.strip()
        d2=response.xpath("//div[@class='col-md-10']//p[2]//text()").get()
        d2=d2.strip()
        d3=response.xpath("//div[@class='col-md-10']//p[3]//span//text()").get()
        d3=d3.strip()
        d4=response.xpath("//div[@class='col-md-10']//p[4]//text()").get()
        d4=d4.strip()
        
       
        
      
        yield{
            "title1":title,
            "title2":d1,
            "title3":d2,
            "title4":d3,
            "title5":d4,
            
    }      
Amen Aziz
  • 769
  • 2
  • 13

1 Answers1

0

The page content is loaded dynamically, you have to click the navigation button or put your desired page number in the form to go to the next/desired page.

In this case, use scrapy and selenium together or pure selenium.

you can check this scrapy middleware- scrapy-selenium

You can perform the selenium operation inside parse_book method and continue scraping the data using scrapy

ahmedshahriar
  • 1,053
  • 7
  • 25