2

This is kind of a carry on from a question asked yesterday: Can't seem to get a web page's contents via cURL - user agent and HTTP headers both set?

I'm attempting to access a url's contents, the problem is the way this url handles request.

The url: http://www.deindeal.ch/deals/atlas-grand-hotel-2-naechte-30-2/

First request (without cookies):

After "learning" to use curl in the command line (props to @d3v3us), a simple request curl -i http://www.deindeal.ch/deals/atlas-grand-hotel-2-naechte-30-2/ shows the following:

curl -i http://www.deindeal.ch/deals/atlas-grand-hote
l-2-naechte-30-2/
HTTP/1.1 302 FOUND
Date: Fri, 30 Dec 2011 13:15:00 GMT
Server: Apache/2.2.16 (Debian)
Vary: Accept-Language,Cookie,Accept-Encoding
Content-Language: de
Set-Cookie: csrftoken=edc8c77fc74f5e788c53488afba4e50a; Domain=www.deindeal.ch;
Max-Age=31449600; Path=/
Set-Cookie: generic_cookie=1; Path=/
Set-Cookie: sessionid=740a8a2cb9fb51166dcf865e35b91888; expires=Fri, 27-Jan-2012
 13:15:00 GMT; Max-Age=2419200; Path=/
Location: http://www.deindeal.ch/welcome/?deal_slug=atlas-grand-hotel-2-naechte-
30-2
Content-Length: 0
Connection: close
Content-Type: text/html; charset=utf-8

Second request (with cookies):

So, I save the cookie using -c, check that it saves as cookie.txt, and run the request again with the addition of -b cookie.txt, getting this:

curl -i -b cookie.txt http://www.deindeal.ch/deals/atlas-grand-hotel-2-naechte-3
0-2/
HTTP/1.1 302 FOUND
Date: Fri, 30 Dec 2011 13:38:17 GMT
Server: Apache/2.2.16 (Debian)
Vary: Accept-Language,Cookie,Accept-Encoding
Content-Language: de
Set-Cookie: csrftoken=49f5c804d399f8581253630631692f5f; Domain=www.deindeal.ch; Max-Age=31449600; P
ath=/
Location: http://www.deindeal.ch/welcome/?deal_slug=atlas-grand-hotel-2-naechte-30-2
Content-Length: 0
Connection: close
Content-Type: text/html; charset=utf-8

To me this looks like exactly the same contents, minus one or two parameters in the cookie, but maybe I'm overlooking something?

I'm attempting to get the curl request to function and return the same contents as when requesting that url via a browser, but I'm not sure what I should do next.

Note: I've tagged this PHP, as I am using PHP to make the requests, I've simply using command line to easily show the returned headers - so if there's any other PHP libraries or methods that would work (better, or in a place that cURL wouldn't), please feel free to suggest any.

Any help would be greatly appreciated ;).

Community
  • 1
  • 1
Avicinnian
  • 1,822
  • 5
  • 37
  • 55

2 Answers2

1

You need this,

curl -iL  -c cookie.txt -b cookie.txt http://www.deindeal.ch/deals/atlas-grand-hotel-2-naechte-3

-b flag is used to read cookie from . For a file to be used to save cookie after the http transaction use -c flag. Its called cookie jar.

Using WebGet (Sorry, Its written by me) pulling the contents is quite simple.

require "WebGet.php";
$w = new WebGet();
$w->cookieFile = 'cookie.txt'; // must be writable
$w->requestContent("https://github.com/shiplu/dxtool");
print_r($w->responseHeaders) // prints response headers
print_r($w->cachedContent) // prints url content
Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187
  • I thought it was the other way around (from here, http://stackoverflow.com/questions/7181785/send-cookies-with-curl), but anyway, it's still not requesting the correct page. I see you've used `-L` to follow location, but the location's contents aren't the same as that when I access the URL via my browser. In my browser - The first time round, it redirects to the end location (302) and sets a cookie, the second time round it takes me to the actual requested contents, (presumably based on the fact I now have a cookie with certain parameters), if that makes sense. – Avicinnian Dec 30 '11 at 14:19
  • Sorry, Updated. In fact you'd want to use both `-b` and `-c` flag. I couldn't find it in my `curl --help` earlier. Now added. – Shiplu Mokaddim Dec 30 '11 at 17:14
0

I may be misunderstanding your question, but a 302 response means content found, and you just need to follow the "Location" right? cUrl will only perform one request, unlike your browser which will see that 302 (set the cookies, just like you're doing) then follow that location header. It looks like your location has a "?" in it that isn't in the original. Run cUrl, with that same cookie jar, on the Location url.

http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#3xx_Redirection

four43
  • 1,675
  • 2
  • 20
  • 33
  • Yeah, it simply displays the end content at "Location:" (HTTP 200), same as though requested without cookies being sent. No HTTP 302's to other locations or anything. I was thinking that perhaps the cookie is handled client side in javascript (and therefore, the redirect is in the contents of "Location:"), but when accessing the url in the browser, I never see a url change in the address bar. – Avicinnian Dec 30 '11 at 14:53
  • They may be checking user agent server side? Try passing a user agent string from a common browser and see if that helps. – four43 Dec 30 '11 at 20:08
  • Just tried that, same result: http://snippi.com/s/mxgardp :/. Seems to be a tricky one. – Avicinnian Dec 31 '11 at 17:56