1

I am trying to figure out a way to clean my temporary storage space using php. I know I can setup cron jobs but, is there a way to do it from php itself?

I use a temporary storage folder for storing generated pdf files for user to download. I have managed to force an expiry on the file so it will no longer be available publically to access after 3 minutes. Now the issue. Although the file is not accessible publically, It is still stored in my server. I have coded something like this to clean up the storage.

/** from the main thread */
if (rand(1, 100) <= 5) {
    Reports::clean();
}

/** the clean function */
public static function clean()
{
    $path = static::getStoragePath();
    if($dir_handle = opendir($path)) {
        while (($fileName = readdir($dir_handle)) !== false) {
            $file = $path . '/' . $fileName;
            
            if (!is_file($file)) 
                continue;

            // If file is older that 3 minutes delete it
            if (time() - filemtime($file) > static::$expires) {
                unlink($file);
            }
        }
        closedir($dir_handle);
    }
}

so this will clean up the storage randomly. Mostly this is fine. but the issue is when the storage clean up starts this slow down that request like turtle.

So I thought of creating a cronjob. But is there a right way to do this from php itself?

Note: I am using slim-4, also don't have much expertise on setting up cronjobs. So any resource will also be helpful

Mohamed Mufeed
  • 1,290
  • 6
  • 12
  • 4
    Yes, a cron job would be the perfect tool for this. Yes, doing it as part of a PHP request slows down the PHP request. – deceze Dec 01 '20 at 10:02
  • @deceze oh I see. I have a doubht - can we do that from php? I have just basic knowledge on the subject. But seen the ability to do so in frameworks like magento. If so how would i approch that? – Mohamed Mufeed Dec 01 '20 at 10:09
  • There are quite a few ideas on [Create temporary file and auto removed](https://stackoverflow.com/questions/1779205/create-temporary-file-and-auto-removed) which may be of help. – Nigel Ren Dec 01 '20 at 10:10
  • Set up a cron job from within PHP? Nah, that's not its job. It's a one-time setup, not something you do on each PHP request. How exactly to do it depends on your system; even with bare UNIX config files it's not that difficult, if you use some hosted service they may even have a GUI for it. You *can* write the cron script itself in any language you want of course, including PHP. – deceze Dec 01 '20 at 10:12
  • thanks I would go with cron then – Mohamed Mufeed Dec 01 '20 at 10:17
  • https://odan.github.io/slim4-skeleton/cronjobs.html may help. – Nigel Ren Dec 01 '20 at 10:28
  • Do note the following: it's legitimate to store temp files which need to be removed later. However, this mostly makes sense if an asynchronous process is generating those files. If your flow is: generate PDF file in PHP request, write it to disk, then redirect user to that file, then you're just being overly complicated. The PHP process which generates the file could directly send the data as HTTP response. You don't need to write the file to disk at all, you just need to `echo` the file contents as HTTP response. – deceze Dec 01 '20 at 10:57

3 Answers3

3

Short answer: no.

PHP has no way of self-triggering any actions.

Create a script (or a command if it's a framework) and trigger it with a cronjob.

2

In your implementation, whenever the clean-up routine hits the main thread, depending on the volume of files in your PDF directory, it may create a significant lag in the response.

As noted in the other comments/answers, a cron job would indeed be the "standard" way to handle this. However, if you don't have access to cron jobs, or simply prefer a pure PHP solution, there are a couple of possible approaches, aside what you're already doing.

  1. Keep a log of the created files along with a creation timestamp. When the clean routine is called, instead of doing a full directory scan with modified time checks, delete on the basis of your record, and purge deleted entries from the record. Store your record e.g. as a JSON file, or as CSV/TSV. This should provide a significant speed-up in cases where there's a significant volume of files, given the reduction in filesystem access.

  2. If you don't want to bother your users with the clean-up lag, move the routine from user-facing files to admin-only files, or do an admin user check, and adjust the clean-up frequency trigger (in OP with 1:20) to match admin usage frequency. This may of course reduce the clean-up frequency (if admins aren't around), but will take the load off the public.

  3. Finally, obviously, become Mr. Cron and trigger the cleanup manually once in a while, on a daily basis or before your server runs out of space. In general, unless you are very tight on space, it doesn't seem to be necessary to clean up every 20 page calls or so. I have no idea of the volume of PDFs generated, traffic happening, or server resources, so it's difficult to come up with recommended figures on the clean-up frequency.

Markus AO
  • 4,771
  • 2
  • 18
  • 29
  • generated files would have few Kbs max. i have not completed the file generation process so can't give you exact mesurement. traffic would also be low like around 15-25 page generation per day – Mohamed Mufeed Dec 01 '20 at 10:38
  • If that's the case, then it's puzzling that your clean-up routine should "slow down the request like a turtle". If there are only a couple of dozen files in your folder, the routine you have should add just a fraction to the total processing time. Now, suppose the PDFs generated are on average 10KB, and there are 100 of them generated in a day. This takes up 1MB in a day. Even if you trigger a manual clean-up once in a month (or a year!!), it seems that you'll be just fine as far as real server concerns go. Practically speaking. This is a fine exercise in good house-keeping, though! – Markus AO Dec 01 '20 at 10:46
  • That is a relief to know. Then may be its just my system that is slow. The project is still in developement so its not deployed to actual server to know the real speed. I had tested this from my local server and in developement stage, so there is room for lot of perfomance optimisation. Lets see – Mohamed Mufeed Dec 01 '20 at 10:51
  • 1
    For a representative test case, using your clean function. On my laptop (i5-8250), I generate 200 dummy files with `touch` timestamp on both sides of the expiry range. Then I run your cleanup routine. It takes around 0.05 sec to complete, with on average 50% of the files deleted. I wouldn't worry too much at these volumes. Suffice it to say, you're safe lowering the cleanup frequency to 1:1000. (Where, even if 50% of page calls generated a PDF, you'd still have only 500 x ~10KB of files to wipe.) – Markus AO Dec 01 '20 at 11:02
0

This is a bad concept, use cronjob scheduler for this job. Here is bash oneliner to delete all files in folder /tmp/some_path if folder content exceeds 10MB (please modify accordingly to your needs):

SIZE=$(du -bs --block-size=1M /tmp/some_path | cut -f1); if [[ $SIZE -gt 10 ]];then echo "Folder '/tmp/some_path' has size ($SIZE MB)."; rm -r /tmp/some_path/*; fi

You can past this code into script.sh (include shebang eg. for bash #!/bin/bash at the top and make file executable ie. chmod +x script.sh), Then append new line to cronjob scheduler (e.g. crontab -e) to check folder size every hour:

0 * * * * <path_to_the_script>/script.sh

You can find more about cronjobs here: https://crontab.guru

TomiL
  • 671
  • 2
  • 11
  • 25