2

I'm trying to make a request to kicksusa.com. If I make the request from any browser, I get the full expected HTML, however, I cannot seem to simulate the request in a way that returns the same HTML, instead I get a 'Request unsuccessful.' message.

Any help is appreciated

My code:

HttpClientHandler httpClientHandler = new HttpClientHandler()
{
    //Proxy = proxy,
    AllowAutoRedirect = true,
    MaxAutomaticRedirections = 15,
    AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate | DecompressionMethods.None
};

var client = new HttpClient();
client.DefaultRequestHeaders.Add("Host", "www.kicksusa.com");
client.DefaultRequestHeaders.Add("Connection", "keep-alive");
client.DefaultRequestHeaders.Add("Upgrade-Insecure-Requests", "1");
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36");
client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate, sdch");
client.DefaultRequestHeaders.Add("Accept-Language", "en-GB,en-US;q=0.8,en;q=0.6");


var _response = await client.GetAsync("http://www.kicksusa.com/jordan-craig/oil-stain-slub-tee-army-green-8909ag.html");

if (_response.IsSuccessStatusCode)
{
    var _html = await _response.Content.ReadAsStringAsync();
}

Fiddler trace headers:

Host: www.kicksusa.com
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
NineBerry
  • 26,306
  • 3
  • 62
  • 93
Dave Bish
  • 19,263
  • 7
  • 46
  • 63
  • 3
    Use Fiddler, and compare each request as its serialized on the wire. –  Nov 07 '16 at 14:08
  • I've done this - and still no luck, see edits – Dave Bish Nov 07 '16 at 14:18
  • That site loads everything with javascript anyway, so even if you managed get the exact response like from browser - it won't help you much because it won't contain any useful info (just a notice that you have to enable javascript). You need to render that site (using tools like CefSharp.OffScreen or any other for that purpose) to execute it's javascript. – Evk Nov 07 '16 at 14:34
  • @Evk - I only need a few bits, that are indeed available if you look in the view-source of chrome. – Dave Bish Nov 07 '16 at 14:35
  • View-source of chrome displays already modified page, not initial html you got from http request in browser. What bits exactly you need from that page? – Evk Nov 07 '16 at 14:42
  • @Evk You may be thinking of 'Inspect Element' 'View source' does indeed make a new request – Dave Bish Nov 07 '16 at 14:45

1 Answers1

3

This website uses some dedicated technology from Incapsula to prevent automated access to the website.

On the first request, the site returns a web document with an embedded iframe. Only when the iframe source is then loaded, a cookie is set and a redirect to the page happens. All further requests will then succeed immediately because the browser sends the cookie information.

In order to circumvent the mechanism, you would have to load the iframe after the first request, remember the cookie and then send the cookie for all further requests. There's also a lot of JavaScript code involved in the first answer which would probably have to be executed for the Incapsula check to succeed.

However, when the site specifically uses such a technology to prevent automatic access to its content, any attempt to circumvent this mechanism, must be considered undesired and as a criminal act. You should not try to automatically gather data from a site without its owner's approval, specifically not when such a technology as Incapusla is used to make this more difficult.

See also this answer by an Incapsula employee for more details.

Community
  • 1
  • 1
NineBerry
  • 26,306
  • 3
  • 62
  • 93