5

I'm the only administrator of the database. I want to run a one time script that basically takes around 3800 scanned images (it will grow up to 10 000 thousand) and create a couple of thumbnails for each image using the PHP exec() function to execute the external program imagemagick to create those thumbnails.

I've created the script, launched it and all works perfectly! All is done on my local development server. The script takes around 11 minutes to create thousands of thumbnails. It's a one time operation that is run every other year, so the consequences are minimal.

So far so good. Here's when i running into problems.

Everything that i did on my local development server I did on the live server for testing purposes. I have a shared host account with hostgator. Running my 11 minutes long script on a shared host gives me the error 'Maximum execution time of 30 seconds exceeded...'. I did my research, tried many of the solutions found in this post (Increase max execution time for php) just to realize there is nothing i can do to change the maximum execution time of a script on a shared host.

I'm stuck. So, my question is what is the obvious solution here.

I was thinking of launching the script for every 200 images, refresh the page automatically and run the script again for the next 200 images and so on until there's no more images. This way i'm sure the 30 seconds maximum execution time allowed on my shared host is respected. It looks like a solution right off the top of my head, but i'm not sure if this is a NO NO, if i'm going to run into bigger problems, too many negatives..

Is this the obvious solution? Anyone run into the same problem? What did you guys suggest?

Thanks

Community
  • 1
  • 1
Marco
  • 2,687
  • 7
  • 45
  • 61
  • 2
    Why do you need to create thumbnails every other year? Shouldn't the one time at creating the original image be enough to also create the thumbnail and leave it forever (until it's obsolete)? – Charlotte Dunois May 10 '16 at 21:02
  • 2
    Did you check if running in CLI mode there is no execution time? Normally for CLI scripts there is `0` (no limit) for the max_execution_time - http://php.net/manual/en/info.configuration.php#ini.max-execution-time – codedge May 10 '16 at 21:02
  • a lot of shared hosts will stop a long running script no matter what you do - so check that first –  May 10 '16 at 21:04
  • 2
    While there are not so many images to process, any temporary solution that would lead to desired result is fine IMO. Why temporary? Because your way to create thumbnails every year is not how it should be done. I'd suggest creating a thumbnail upon uploading image to the server. – lolbas May 10 '16 at 21:04
  • To directly answer your question, yes you could do that. But I have to reiterate the question by the other commentators and ask why do you have to do it every other year? You could do it every time the image is first uploaded. – Cave Johnson May 10 '16 at 21:06
  • You should talk to the hosting service and check what they can do... if they can run "internally" your script... other wise you should do it by lots and then change the script to be executed on each image that is uploaded or that meet your parameter upon upload. – DIEGO CARRASCAL May 10 '16 at 21:10
  • 1
    You have no restrictions on your local server, so why not copy the images to your local server run your script, and then FTP the new thumbs back up to your server. Then change the script that uploads images to do it as each image is uploaded from now on. Or if you insist, do this again in 2 years – RiggsFolly May 10 '16 at 21:21

1 Answers1

7

Assuming you do have a reason to recreate the thumbnails in batch, instead of doing it at each image upload as was suggested, I'd do exactly as you did - use a script that refreshes itself - except that I wouldn't set a fixed number of images.

Rather I would have the script time itself after each image, and stop when it has reached, say, 25 seconds:

$stop = time() + 25;
while (time() < $stop) {
    ...find image to process, process it.
    if (finished()) {
        die("OK");
    }
}
// Redirect to next batch of images
die(Header('Location: ....'));

However, do check with your ISP, because your script might be either seen as an abuse of the service, or it could be mistaken for an attack. Also, enquire whether there's a preferred time of day to run this kind of maintenance.

Another, naughtier way of doing the same thing is to have the script run for a very small number of images (possibly a single one) every time someone hits the home page. This has the effect of having the extra load from the script mimic the real load on the server, avoiding embarrassing spikes or mysterious nonzero base loads. You do need to find a way of never choosing the same image from different instances of the script running in parallel (when I had to do this, I set a flag in a database).

LSerni
  • 55,617
  • 10
  • 65
  • 107
  • 2
    just remember to _track the last image processed successfully_ and start from it the next execution... – DIEGO CARRASCAL May 10 '16 at 21:12
  • Why not just do one at a time? In particular it would simplify everything – ArtisticPhoenix May 10 '16 at 21:13
  • 1
    @ArtisiticPhoenix that will lead to (in this case) >3800 server requests within pretty short period of time. It will slow down the server since establishing connection for every requests takes some time and CPU and if server will not have enough physical resources, in a very bad case it might go down. – lolbas May 10 '16 at 21:17
  • 1
    Expanding on @DIEGOCARRASCAL's comment, you might have to make sure that the list of files is sorted the same way on each iteration so you don't skip any or reprocess the same images. Even better: you could store file paths names in a database table and loop through the the files in the table and keep track of the last proccessed id. – Cave Johnson May 10 '16 at 21:18
  • @ArtisiticPhoenix , if the image selection and setup part is very short compared with the images processing time, you might do just that. On the other hand, if it is not, the overhead from the requests might be appreciable, and a larger number of images per cycle be more efficient. – LSerni May 10 '16 at 21:19
  • This is all true, but the point was that it's probably easier to do a single image per request then to have to check that the last image didnt fail midway though because of the timeout. As for server network load, if time is not a problem, surly a small delay of 1 second or so between request would negate any of those issues. I've done this many times on a non-shared host. In my case it wasn't time but memory that is the issue, and mind you my server has 12 duo core processors and 48 gigs of ram. But doctrine can be that way sometimes when you are dealing with millions of rows of data.... – ArtisticPhoenix May 11 '16 at 01:08
  • 1
    it's a fair strategy in the short term, for the long term though I would look at using a message queue and doing the thumbnails as a detached background process, either initiated by the user or on a nightly basis. Even an ongoing cron job ( if on linux ) would do to keep things nice and tidy. Currently I'm in the process of using an OCR to process about 60GB worth of pdfs, things of this nature can be quite tedious and time consuming, Good Luck! – ArtisticPhoenix May 11 '16 at 01:11
  • @Iserni, I like the note about the "race" condition of hitting the same image. My suggestion would be to create a placeholder file, like the thumbnail but without the data in it as a soon as the script starts, then you don't need the database and can use the the file time to check for processes that have stopped, ie the filesize is 0 and the last modified time is 24 hours ago. – ArtisticPhoenix May 11 '16 at 01:19