2

I'm trying to scrape data from this site using the scrapy https://www.superbancos.gob.pa/es/fin-y-est/reportes-estadisticos?field_ano_rep_est_value=2018

but the response I get is the following html response:

You are being redirected... Javascript is required. Please enable javascript before you are allowed to see this page.

I tried disabling the JavaScript from the Chrome Browser to see if I could get the same Scrapy Response, but it keep showing me the data.

I couldn't figure it out if I needed to change or add something to my settings.py

Could it be a Request Headers? or the agent?

class TestSpider(scrapy.Spider):
    name = "test"        
    def start_requests(self):
        url = 'https://www.superbancos.gob.pa/es/fin-y-est/reportes-estadisticos?field_ano_rep_est_value=2018'
        yield scrapy.Request(url=url, callback=self.parse)
     
    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'report-%s.html' % page
        with open(filename, 'wb') as f:
            f.write(response.body)
Nimantha
  • 6,405
  • 6
  • 28
  • 69

1 Answers1

0

Use these headers and cookies and see the difference

cookies = {
    'sucuri_cloudproxy_uuid_3763320b2': 'b0cda35ef63b5b3df4215f2b7902756f',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Cache-Control': 'max-age=0',
    'TE': 'Trailers',
}
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146