4

I'm fetching a file from a remote website via the HTTP 1.0 protocol. I figured I'd be nice and use gzip when fetching the file as to minimize the bandwidth used.

No matter how I formed my headers I did not get gzipped content in the response although when testing it with a browser it did. I also get the gzip format served from my own website using my code.

I figured this was because their server is using chunked transfer encoding which is only available in HTTP 1.1.

I switched protocol to HTTP 1.1. This is my code below. My website answers to this, although it takes multiple seconds to do what 1.0 does instantly. When I try it on the remote website it keeps on loading forever without answering.

So my question is, why is 1.1 so slow?. Am I using a malformed header or something? Also, why does my page answer yet the other does not. Any input? Thanks.

$header = array(
    'http' => array(
    'method'  => 'GET',
    'header'  => 'Accept-Encoding: gzip\r\n' .
    'User-Agent: test\r\n)' .
    'Accept-Charset: ISO-8859-1,utf-8\r\n' .
    'Accept-Encoding: gzip, sdhc, deflate\r\n' . 
    'Host: www.mysite.test.com\r\n' .,
    'protocol_version' => '1.1\r\n'
);

$context = stream_context_create($header);
$file_string = file_get_contents('www.mysite.test.com/test.txt', false, $context);

Edit: It definitely seems like its keeping the connection open until the servers keep-alive limit is reached. Took about 1.1 minute to get my answer from their webpage. Need to figure out how to close connection then. Otherwise it seems to work.

hexacyanide
  • 88,222
  • 31
  • 159
  • 162
raecer
  • 195
  • 4
  • 14
  • Altough I'm a fan of contexts and streams, in this case I would use CURL – Miloš Đakonović Mar 10 '13 at 13:38
  • @Miloshio why? what benefit would it offer? How would it help the OP? – Gordon Mar 10 '13 at 13:39
  • 1
    Yes, I've seen what CURL can do but it is supposed to be entirely possible without it as well and i am doing it as a programming exercise for my own sake. Thanks for the input though! – raecer Mar 10 '13 at 13:41
  • @Gordon, when creating contexts manually, there is always good chance to do wrong something that is clear and straightforward. CURL does dirty job for you. – Miloš Đakonović Mar 10 '13 at 13:42
  • @raecer see if http://stackoverflow.com/questions/3485843/file-get-contents-with-context-to-use-http-1-1-significantly-slow-download-spe?rq=1 solves your problem. – Gordon Mar 10 '13 at 13:48
  • @Gordon i read that thread already but specifying Connection: close in my header (if thats where it should go) did nothing at the time. – raecer Mar 10 '13 at 13:57

1 Answers1

1

Well... Seems like the answer was obvious after a while of bashing my head against the wall.

I moved Connection close to the top and it suddenly worked, but then gzip setting stopped working. So i tried to figure out why the order seemed to matter.Seems i was quoting in single quotes ' instead of " causing the \r\n not to evaluate correctly. At least i think that was the problem. It seems to be working now. Thank you all anyway... I hate it when i do simple mistakes like this...

Edit again: I still don't seem to be getting gzips from the site, though it works from mine. I'll try copying the headers from a browser and see what happens.

Edit 2: There we go! It works like intended. Maybe they were somehow filtering on User agents or what-not.

Edit 3: Now im getting really random results when downloading the same file multiple times. sometimes i get it gzipped, sometimes not. Their server randomly serves me one of two headers. The only difference is Vary: Accept-Encoding and Content-Encoding: gzip. I thought it would always send gzip once i told it i could handle it? My own server seems to constantly serve gzips.

Edit 4: For some reason i got served gzip:ed sometimes and uncompressed sometimes when using an earlier MSIE 5.0 version in the user agent. I could understand only handing over gzips to user-agents capable of handling it but at least it should be consistent. Anyway. Problem solved, thank you.

raecer
  • 195
  • 4
  • 14