How do I keep a local copy of a large JSON file updated?

Question

I'm writing a php-based website, and it requires a json table from some other website. It's a relatively large file that hardly ever gets updated, so preferably I'd like to keep a local copy and use that, rather than constantly request their server for the same file.

But I do need the latest version of their json file, otherwise a lot of my links to their site will break.

I'm new to web-development so I'm not sure how to approach this. I've read up on things like Redis and Memcaching, but I'm not sure if that's what I'm looking for or how to get it working.

How can I keep my local version of their json file up to date, or is there a better approach?

You would still need to request their full file to get the updates. Unless they offer a copy with only changes. Even if that means once every hour/day/week/whatever you get a new copy of the file and save it. And redis/memcaching are not what you are looking for. You would probably just get a script to get the file every once in a while and overwrite it on disk. You check the time the file was last modified and if greater than X age get a new copy. — Jonathan Kuhn, Oct 09 '14 at 17:36
For that, would I use something like 'cron' then? I do believe that its an automated way of running scripts, would it work for keeping the file updated? — Taelia, Oct 09 '14 at 17:42
yes, cron. Cron is used to run a script/program on a timer which could be a php script that fetches a new copy of the file. Just know that if you run it every 5 minutes (as an example) and those links in the json file change, the links on your site will not be valid until a new copy of the file is downloaded. The only way around this is to get the file more often, but at what point is it not saving you anything over just getting it every time is up to you. — Jonathan Kuhn, Oct 09 '14 at 17:45

score 0 · Accepted Answer · edited May 23 '17 at 11:49

Normally in situations like these you would request an updated copy of that file every X hours and store it on your server.
In pracice: on every page load you check how old that file is (filemtime()); if it's older than X hours, then you download it again and replace the contents of the cached files.
X, the number of hours, has to be adjusted accordingly to the frequency they update the original document. For example, if you know that document is updated every 6 hours, you may want to refresh it every 2 hours (X=2), for example.

You should be prepared to have an outdated cache, however: at any moment your cached file can be up to X hours (e.g. if you download the file and the webmasters replace that exactly 1 minute later, your cached file will be outdated for X hours less a minute).

There are certain ways you can improve this process:

If the file is really big, downloading it while preparing a web page in response to a visitor can be a really bad idea. Indeed, visitors hate waiting, and every ms more it takes to generate your page the more likely the visitor is to go away and lose interest. There are studies about that (ironically, that page took me a few seconds to load :) - and in the meanwhile I went back to this tab to type).
Solution: have a background job that periodically refreshes the cached file. You can do that with cron; if you don't have access to cron, there are other ways to emulate that in PHP. For example, see what the guys at TechCrunch came up with.
Instead of downloading the entire file every time, you could (if the remote server supports it) first make a request to see if the file has changed. That's possible with the If-Modified-Since header. See for example this SO question. Another way is to make an HEAD HTTP request and check the last modified time. Note that not all servers may support these tricks (especially if the remote server is generating that file dynamically). If they are supported, however, you can decrease the time interval between each request (the X above), since you won't be downloading the whole file every time.

score 0 · Answer 2 · answered Oct 09 '14 at 18:32

You can create a evenement to update each time it occurs.

And then use this code to save your files

       //Naming your file
       $path= "your path";
       $dt = date("d.M.Y");//You can use untill hours or min or seconds, if you  planning to many changes "date("F j, Y, g:i a");"  http://php.net/manual/en/function.date.php
       $dir = $path.$dt; 


       //Check if file existis or create it
       if(!is_dir($dir)){//If dir not exists, create your file
            mkdir($dir);
       }
       else{//Or replace or Update it
           $h=  opendir($dir);
           while($file = readdir($h)){
                if($file !="." && $file != ".."){
                    unlink("$path$dt/".$file);
                }
            }
            closedir($h);
       }


       $myJSONobject= "your json object";
       //Your will store your data as a 1 line string using json_encode
       $myJSONstring = json_encode($myJSONobject);

       //Now you will open the file you have created and write the string just created
       $handle = fopen($dir,"w");
       fwrite($handle,$myJSONstring);

While are you storing each file in a separate directory? You don't need to create multiple files, just overwrite the existing one and check the "mtime" value with filemtime() — ItalyPaleAle, Oct 09 '14 at 18:58
Yes we can use as multipçe files in one directory, or we can use one file for different directories. I prefer create different directories with on file inside (same name). For that I can store future information during the time it was created. For exemple I do that to backup my Mysqldatabase, and I store different models or pdf files generated too. — IgorAlves, Oct 09 '14 at 19:05

How do I keep a local copy of a large JSON file updated?

2 Answers2