14

I think my question seems pretty casual but bear with me as it gets interesting (at least for me :)).

Consider a PHP page that its purpose is to read a requested file from filesystem and echo it as the response. Now the question is how to enable cache for this page? The thing to point out is that the files can be pretty huge and enabling the cache is to save the client from downloading the same content again and again.

The ideal strategy would be using the "If-None-Match" request header and "ETag" response header in order to implement a reverse proxy cache system. Even though I know this far, I'm not sure if this is possible or what should I return as response in order to implement this technique!

Mehran
  • 15,593
  • 27
  • 122
  • 221

1 Answers1

25

Serving huge or many auxiliary files with PHP is not exactly what it's made for.

Instead, look at X-accel for nginx, X-Sendfile for Lighttpd or mod_xsendfile for Apache.

The initial request gets handled by PHP, but once the download file has been determined it sets a few headers to indicate that the server should handle the file sending, after which the PHP process is freed up to serve something else.

You can then use the web server to configure the caching for you.

Static generated content

If your content is generated from PHP and particularly expensive to create, you could write the output to a local file and apply the above method again.

If you can't write to a local file or don't want to, you can use HTTP response headers to control caching:

Expires: <absolute date in the future>
Cache-Control: public, max-age=<relative time in seconds since request>

This will cause clients to cache the page contents until it expires or when a user forces a page reload (e.g. press F5).

Dynamic generated content

For dynamic content you want the browser to ping you every time, but only send the page contents if there's something new. You can accomplish this by setting a few other response headers:

ETag: <hash of the contents>
Last-Modified: <absolute date of last contents change>

When the browser pings your script again, they will add the following request headers respectively:

If-None-Match: <hash of the contents that you sent last time>
If-Modified-Since: <absolute date of last contents change>

The ETag is mostly used to reduce network traffic as in some cases, to know the contents hash, you first have to calculate it.

The Last-Modified is the easiest to apply if you have local file caches (files have a modification date). A simple condition makes it work:

if (!file_exists('cache.txt') || 
    filemtime('cache.txt') > strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
    // update cache file and send back contents as usual (+ cache headers)
} else {
    header('HTTP/1.0 304 Not modified');
}

If you can't do file caches, you can still use ETag to determine whether the contents have changed meanwhile.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • It's my bad that I forgot to mention, eliminating PHP is impossible. There could be a complicated logic within the PHP. – Mehran May 15 '12 at 07:43
  • @MehranZiadloo Maybe you didn't get my answer, I've rephrased it. – Ja͢ck May 15 '12 at 07:45
  • Even though your update clarified your point but I'm afraid it's still impossible to eliminate PHP since the content is sometimes generated rather than loaded. Thanks anyway. – Mehran May 15 '12 at 07:52
  • Updated the answer again. You stated "purpose is to read a file from filesystem", so you should be clearer about that :) – Ja͢ck May 15 '12 at 08:03
  • How does the ping work exactly? I mean, when the client requests a page, I don't see any HTTP_IF_MODIFIED_SINCE variable and I have set the ETAG when serving document. Because this not working, I designed another method that 'pings' the server with ajax and asks for the versionnumber. If the versionnumber is not the same as the page it will reloaded by window.location.reload(true) to force a request. When caching all content parts (css and js for example) only one request is needed to check for differences instead of allot '304 Not Modified' replies. Maybe bad thing it relies on JS but worksOK – Codebeat Dec 04 '13 at 02:48
  • That is what i'm using, but thanks for the comment. - header('If-None-Match: '.$ETag ); – Codebeat Dec 04 '13 at 03:01
  • @Erwinus Ehm, `If-None-Match` is a request header, not a response header lol; you should set `ETag: value` in the response :) – Ja͢ck Dec 04 '13 at 03:05
  • @Jack: Okay thanks dude but keep the comments nice, not everyone is as smart as you are. ;-P It is working now thanks. – Codebeat Dec 04 '13 at 03:09
  • @Erwinus I've made it more explicit in my answer which type of headers they are. Thanks for the feedback. – Ja͢ck Dec 04 '13 at 03:13