Incapsula is a web application delivery platform that can be used to prevent scraping.
I am working in Python and Scrapy and I found this, but it seems to be out-of-date and not working with current Incapsula. I tested the Scrapy middleware with my target website and I got IndexErrors owing to the fact that the middleware was unable to extract some obfuscated parameter.
Is it possible to adapt this repo or has Incapsula now changed in its mode of operation?
I'm curious also as to how I can "copy as cURL" the request in from chrome dev tools to my target page, and the chrome response contains the user content, yet the curl response is an "incapsula incident" page. This is for chrome with cookies initially cleared.....
curl 'https://www.radarcupon.es/tienda/fotoprix.com'
-H 'pragma: no-cache' -H 'dnt: 1' -H 'accept-encoding: gzip, deflate, br'
-H 'accept-language: en-GB,en-US;q=0.9,en;q=0.8'
-H 'upgrade-insecure-requests: 1'
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.94 Chrome/62.0.3202.94 Safari/537.36'
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
-H 'cache-control: no-cache' -H 'authority: www.radarcupon.es'
--compressed
I was expecting the first request from both to return something like a javascript challenge, which would set a cookie, but it doesn't seem to quite work like that now?