35

I have a very simple question: what is the best way to download a file in PHP but only if a local version has been downloaded more than 5 minute ago?

In my actual case I would like to get data from a remotely hosted csv file, for which I currently use

$file = file_get_contents($url);

without any local copy or caching. What is the simplest way to convert this into a cached version, where the end result doesn't change ($file stays the same), but it uses a local copy if it’s been fetched not so long ago (say 5 minute)?

hyperknot
  • 13,454
  • 24
  • 98
  • 153

9 Answers9

87

Use a local cache file, and just check the existence and modification time on the file before you use it. For example, if $cache_file is a local cache filename:

if (file_exists($cache_file) && (filemtime($cache_file) > (time() - 60 * 5 ))) {
   // Cache file is less than five minutes old. 
   // Don't bother refreshing, just use the file as-is.
   $file = file_get_contents($cache_file);
} else {
   // Our cache is out-of-date, so load the data from our remote server,
   // and also save it over our cache for next time.
   $file = file_get_contents($url);
   file_put_contents($cache_file, $file, LOCK_EX);
}

(Untested, but based on code I use at the moment.)

Either way through this code, $file ends up as the data you need, and it'll either use the cache if it's fresh, or grab the data from the remote server and refresh the cache if not.

EDIT: I understand a bit more about file locking since I wrote the above. It might be worth having a read of this answer if you're concerned about the file locking here.

If you're concerned about locking and concurrent access, I'd say the cleanest solution would be to file_put_contents to a temporary file, then rename() it over $cache_file, which should be an atomic operation, i.e. the $cache_file will either be the old contents or the full new contents, never halfway written.

Community
  • 1
  • 1
Matt Gibson
  • 37,886
  • 9
  • 99
  • 128
  • Thanks for the code Matt! It's super clean, well commented and works without any modification! – hyperknot Mar 10 '11 at 17:02
  • 1
    @zsero Cool. But do put some error checking in there :) You might run into problems if the cache directory isn't writeable by the web server user, for example... – Matt Gibson Mar 10 '11 at 17:15
  • 1
    Yeah it might need some error checking but its such a small project that there will be no one else using or deploying this code. And if it's broken the else part actually goes into cache-less mode, instead of braking down. Nice. – hyperknot Mar 10 '11 at 17:24
  • 1
    Make sure to define your $cache_file on top (example): `$cache_file = $_SERVER['DOCUMENT_ROOT'] . '/my-cache.php';` – farjam Sep 24 '14 at 17:23
  • Don't forget to do `clearstatcache()` before calling `filemtime()`. http://php.net/manual/en/function.filemtime.php – Volomike Jan 13 '15 at 04:06
  • 2
    @Volomike As I understand it, the stat cache is cleared at the start of every script run, so as long as you're not calling this method multiple times within the same script, it should be fine. (I just checked, and in [filestat.c](https://github.com/php/php-src/blob/4b943c9c0dd4114adc78416c5241f11ad5c98a80/ext/standard/filestat.c) you'll see the stat cache being cleared in a `PHP_RINIT_FUNCTION` callback, so it's definitely reset at the start of every request.) – Matt Gibson Jan 13 '15 at 08:28
  • @MattGibson Fully noted. Thanks for the C lookup on that! :) – Volomike Jan 14 '15 at 18:44
  • Then how to lock it with rename()? It will be automatically locked? @Matt Gibson – user4271704 Sep 13 '16 at 12:55
  • Great answer, and still valid after 10 years. – Ozgur Dec 05 '20 at 12:35
10

Try phpFastCache , it support files caching, and you don't need to code your cache class. easy to use on shared hosting and VPS

Here is example:

<?php

// change files to memcached, wincache, xcache, apc, files, sqlite
$cache = phpFastCache("files");

$content = $cache->get($url);

if($content == null) {
     $content = file_get_contents($url);
     // 300 = 5 minutes 
     $cache->set($url, $content, 300);
}

// use ur $content here
echo $content;
Ken Le
  • 1,787
  • 2
  • 22
  • 34
4

Here is a simple version which also passes a windows User-Agent string to the remote host so you don't look like a trouble-maker without proper headers.

<?php

function getCacheContent($cachefile, $remotepath, $cachetime = 120){

    // Generate the cache version if it doesn't exist or it's too old!
    if( ! file_exists($cachefile) OR (filemtime($cachefile) < (time() - $cachetime))) {

        $options = array(
            'method' => "GET",
            'header' => "Accept-language: en\r\n" .
            "User-Agent: Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)\r\n"
        );

        $context = stream_context_create(array('http' => $options));
        $contents = file_get_contents($remotepath, false, $context);

        file_put_contents($cachefile, $contents, LOCK_EX);
        return $contents;

    }

    return file_get_contents($cachefile);
}
Xeoncross
  • 55,620
  • 80
  • 262
  • 364
0

If you are using a database system of any type, you could cache this file there. Create a table for cached information, and give it at minimum the following fields:

  • An identifier; something you can use to retrieve the file the next time you need it. Probably something like a file name.
  • A timestamp from the last time you downloaded the file from the URL.
  • Either a path to the file, where it's stored in your local file system, or use a BLOB type field to just store the contents of the file itself in the database. I would recommend just storing the path, personally. If the file was very large, you definitely wouldn't want to put it in the database.

Now, when you run the script above next time, first check in the database for the identifier, and pull the time stamp. If the difference between the current time and the stored timestamp is greater than 5 minutes pull from the URL and update the database. Otherwise, load the file from the database.

If you don't have a database setup, you could do the same thing just using files, wherein one file, or field in a file, would contain the timestamp from when you last downloaded the file.

user470714
  • 2,858
  • 1
  • 28
  • 34
0

First, you might want to check the design pattern: Lazy loading.

The implementation should change to always load the file from local cache. If the local cache is not existed or file time jitter longer than 5 minute, you fetch the file from server.

Pseudo code is like following:

$time = filetime($local_cache)
if ($time == false || (now() - $time) > 300000)
     fetch_localcache($url)  #You have to do it yourself
$file = fopen($local_cache)
Edward Thomson
  • 74,857
  • 14
  • 158
  • 187
Theon Lin
  • 330
  • 3
  • 8
0

Best practice for it


$cacheKey=md5_file('file.php');

dılo sürücü
  • 3,821
  • 1
  • 26
  • 28
-1

You can save a copy of your file on first hit, then check with filemtime the timestamp of the last modification of the local file on following hits.

alfmartinez
  • 179
  • 3
-2

I think you want some (psuedo code) logic like:

if ($file exists) {
  if ($file time stamp older than 5 minutes) {
     $file = file_get_contents($url)
  }
} else {
     $file = file_get_contents($url)
}

use $file
Peter M
  • 7,309
  • 3
  • 50
  • 91
  • @zsero .. The extra layer is in there because you can't test the time stamp of a file that doesn't exist. – Peter M Mar 10 '11 at 17:44
-2

You would warp it into a cache like method:

function getFile($name) {
    // code stolen from @Peter M
    if ($file exists) {
      if ($file time stamp older than 5 minutes) {
         $file = file_get_contents($url)
      }
    } else {
         $file = file_get_contents($url)
    }
    return $file;
}
powtac
  • 40,542
  • 28
  • 115
  • 170