I am Scraping with scrapy-playwright an ecommerce site where when I scrap with headless: True, I am getting 403 error but, with Headless False I am getting 200,I even tried randomizing User agent still getting blocked.
The scrap is running with firefox playwright driver annd webkit driver but, its taking so much time, I want to run it with chromium
def make_request_from_data(self, data):
payload = json.loads(data)
isbn = payload["isbn"]
url = f"https://www.barnesandnoble.com/s/{isbn}"
meta = {
"region": self.region,
"isbn": isbn,
"playwright": True,
"playwright_include_page": True,
"playwright_context": f"context-{isbn}",
"playwright_context_kwargs": {
"java_script_enabled": True,
},
}
headers = {
"accept-encoding": "gzip, deflate, br",
"accept-language": "en",
"cache-control": "no-cache",
"pragma": "no-cache",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "none",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36",
}
yield Request(
headers = headers,
url = url,
callback=self.parse,
errback=self.close_context_on_error,
meta=meta,
dont_filter=True,
)
Isbn is the book code, and my wild guess is with the chromium version, I dont know how to downgrade chromium version in playwright