7

I have a project in which a user uploads an image through a form and the server does some thumbnails. The thumbnail making process is very slow so I thought that doing the image resizing with a non-blocking function could be a good solution. I mean: the server process the form (wich have more fields) gives the "ok" feedback to the user and then calls the thumbnailing function. How can I do that?

Thanks in advance

LuisClemente
  • 385
  • 2
  • 11
  • Related, not necessarily a duplicate: http://stackoverflow.com/questions/124462/asynchronous-php-calls – Bobby Feb 10 '11 at 09:03
  • Begs the question of why the 'thumbnail making process is very slow'? What sort of delay is it? Such basic image processing should be well under a second for one image, which should not unduly affect a user, and would certainly be preferable to all this programming headache with its multiple levels of synchronisation risk. Worst case is indicating to the user that there will be a delay. People are more tolerant when their expectations have been managed beforehand. – Patanjali Dec 22 '20 at 06:29

7 Answers7

7

Your best option would be to implement Gearman. It's a Job Queue system where you can implement either synchronous of asynchronous jobs. http://gearman.org/

Miljar
  • 241
  • 1
  • 5
6

Better solution I usually go for: Create the thumbnails dynamically when needed, not upon upload.

You create a script that generates thumbnails on the fly, and all your image tags point to this script:

<img src="/thumbnail.php?image=foobar.jpg&size=150" />

This delays the thumbnail generation until it is needed and works "asynchronously". With some .htaccess rewrite magic you can even make it look like a normal image file and cache images in a way that the Apache server will serve them the next time without invoking the script.


To be a little more detailed, I use this for user profile images:

Image tags:

<img src="/img/users/123456/50.jpg" />

.htaccess:

<IfModule mod_rewrite.c>
    RewriteEngine On

    # Rewrites requests for user images to match directory structure.
    # E.g.: URL /img/users/123456/50.jpg -> /img/users/123/123456/50.jpg
    # Intermediate directory level is introduced to avoid cramming too many directories into the same directory.
    RewriteRule ^img/users/(\d{1,3})(\d*)/(\d+\.\D+)$ img/users/$1/$1$2/$3 [nocase,last]

    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ index.php?url=$1 [QSA,L]

</IfModule>

This first of all rewrites image requests to a deeper directory structure. If the image exists, Apache will serve it as usual. If it doesn't, my regular application is invoked. In the app, I route /img/users/... URLs to a module that ends up with two pieces of information: the user id 123456 and the requested size 50. It then generates a thumbnail roughly according to this logic:

  1. Find profile image for user 123456
  2. Generate thumbnail in requested size
  3. Write thumbnail to /img/users/123/123456/50.jpg, where it will be picked up by Apache next time
  4. Output image
deceze
  • 510,633
  • 85
  • 743
  • 889
  • 3
    That could really slow down things if lots of images are getting uploaded very frequently... batch is the way to do this, IMPO. – xil3 Feb 10 '11 at 09:05
  • 2
    @xil3 It doesn't really make any difference, the server will be busy either way. Just the point in time changes, not the actual workload. It also makes sure that images are available when needed, unlike the cron job which delays the availability of the images. Proper caching of course is a must, as I wrote. – deceze Feb 10 '11 at 09:10
  • 2
    Yeah, I guess a pairing of the 2 solutions would be best - just in-case the images haven't been processed by the batch yet. – xil3 Feb 10 '11 at 09:11
  • 2
    this could be a good thing if you actually keep the generated thumbnails and don't re-generate them. something like if thumbnail exists return thumbnail, else generate. – cromestant Feb 10 '11 at 09:17
  • @deceze: "just the point in time changes" Yes, that is completely correct - and that was exactly the point in the solution I mention in my answer. When generating this on request, the first user who requests a thumbnail will see a heck of a slow-loading page, not to mention concurrency issues (another user requests a thumbnail while it is generated). – Piskvor left the building Feb 10 '11 at 09:24
  • 1
    Will slightly slow down loading times (badly if a user opens a page where all thumbnails must be regenerated). Creation of thumbnails is multithreaded, making good use of multi core CPUs. Beware of race conditios if two script need to generate the same image simultanously. Saves thumbnails that are never displayed from the need to be generated. – yankee Feb 10 '11 at 09:28
  • 1
    @Piskvor Valid points, but: At least in my case, after uploading an image, the user is redirected to a page that contains the image he just uploaded, so he'll be the first to see it and the only one experiencing slowness (which he would one way or another anyway). This also largely solves the problem of concurrent requests for non-existing thumbnails, since it simply makes it unlikely. – deceze Feb 10 '11 at 09:33
  • 1
    @yankee It also makes development very flexible since you can change image sizes, or even types or quality, at any time. :) – deceze Feb 10 '11 at 09:35
  • @deceze: Ah, now *that* is clever: you're essentially shifting the asynchronous request to the client. Good point with the variable thumbnail sizes, and the URL rewrite is a nice touch. I should try something like this in some new project. – Piskvor left the building Feb 10 '11 at 09:39
  • 1
    @Piskvor Thanks. :) You may have to be a little cleverer if, for example, you display the newest uploaded images on your front page and you have Facebook-level traffic. Then the concurrency problem is a little worse. For average uses it has served me well so far though. – deceze Feb 10 '11 at 09:44
  • @deceze I was instinctivly coding my images bank website this way and was questionning about caching the images, this is great and the mod_rewrite solution is what I need, however i'm not sure to get it, if the query string is not altered between requests to requests, is not the browser supposed to cache images all the way ? – vdegenne Oct 14 '11 at 16:08
  • @Oddant Sure, the browser will do some caching as well, but the point is that the server must not/should not recreate the image over and over, since that is very expensive. With this method PHP is only required to care about the image once, following requests for the same image are handled by the web server directly. – deceze Oct 14 '11 at 23:37
  • @deceze ok now it's clear and since I've been using mod_rewrite, it definitly improves the way my webpages are loaded, thanks :) – vdegenne Oct 15 '11 at 12:27
  • You could solve the concurrency problem easily by using an in memory key Value store/cache to implement locking. If you do get to the stage where concurrency is an issue, then you most likely need a cache anyway. E.g. Memcached, Redis – frostymarvelous Jul 23 '15 at 08:22
3

You could have a cronjob that executes the thumbnailing script. You could add the image to be resized in some sort of queue (mysql database perhaps) and the thumbnailing script runs every minute to check if there is something in the que and then starts resizing.

gnur
  • 4,671
  • 2
  • 20
  • 33
  • 1
    crno jobs work wonders, but sometimes you don't have cron access on the server machine. There are ways to keep PHP procesing after the user has been given his response. – cromestant Feb 10 '11 at 09:16
  • Thanks for your answer. The problem is that I can't wait until the cronjob process the images, I need the thumbnails relly soon after the upload but I don't want the user who uploads them to wait until such thumbs are created. – LuisClemente Feb 10 '11 at 09:37
  • You could rewrite it to be a 'daemon'. Set the max execution time to 1 minute and keep checking if there are new images in a `while(true)` loop. This way the script will be as fast as directly converting and it will be non-blocking. – gnur Feb 10 '11 at 10:27
  • 2
    It will be busy-waiting instead. Although the core of your suggestion is sound, `while(true)` will happily consume all available CPU time asking "are we there yet? are we there yet?" Use something like `inotify`, or at the very least, sleep a few seconds before continuing the loop. – Piskvor left the building Feb 10 '11 at 11:01
2

On one system, I've seen an independent background process making the thumbnails:

  • form is processed normally, without generating any thumbnail at all
  • the image is given an unique name and copied to a special folder.
  • a database entry is created in a thumbnails table, linking the original image and the new unique name, marked as "to be thumbnailed"
  • the form processing script stops to care and continues with whatever else it needs to do.

There's an independent background process (and its watchdog), which continuously watches that special folder (most OSes have various tools that notify you when a folder's contents change); if it finds an image there, it will:

  • make a thumbnail (we were using ImageMagick's CLI for that)
  • save it somewhere else
  • update the database, set the image status as "thumbnailed OK" (or "failed", if it couldn't make one)

When you need the thumbnail, check the thumbnails table - if the image is not "thumbnailed OK", show a placeholder, else get the correct thumbnail name and display this.

That worked great - most thumbnails were created within a few seconds, without slowing down the user-facing scripts. Note that you'll need to start the background script somehow - in this case, there was a watchdog in cron, which restarted the thumbnailing script if it died.


@yankee objects that some elements are uncommon:

  • it is not necessary for the thumbnailer process to run as a background script - if you can live with a minute of latency before you get the thumbnails, you could run it as a cron script, getting rid of the watchdog altogether.
  • ImageMagick was chosen over GD for specific performance reasons; the thumbnailer could use whatever method is available.

Edit: I checked the site, and there is one more mechanism - this one is not necessary and adds a bit of load, but looks cool, especially if you don't expect full page loads very often (e.g. on AJAX-driven sites):

  • where a "not-failed-but-no-thumbnail" placeholder is output, the thumbnail is shown in an img with class="nothumb"
  • a JS function checks for images with this class
  • if any are found, it will periodically check if a thumbnail is available yet
  • the static placeholders are replaced with "loading" placeholders
  • if found, it will replace the placeholder with the thumbnail

This loads the thumbnails as soon as they are ready, at the cost of some resources. For a continuous background process, it's not really needed; but if you want to ensure that users will see the thumbnails as they become available instead of on their next pageload, this is an useful addition.

Piskvor left the building
  • 91,498
  • 46
  • 177
  • 222
  • This is an efficient solution, but unfortunately requires lots of privileges on the server not commonly available for the php developers. – yankee Feb 10 '11 at 09:10
  • @yankee: well, it probably won't work on shared hosting, that's for sure - but the only "uncommon" elements I see is 1) ImageMagick and 2) cron script, possibly 3) a long-running background process (which is not strictly necessary - could run as a cron script). Never had a problem with any of that, except on the cheapest webhosts. – Piskvor left the building Feb 10 '11 at 09:13
  • @Piskvor: If you do it as cron it's easier of course, but will cost some delay until the thumbnails are generated. I am wondering about the long-running background process. I thought most hosters did not allow you to override the maximum execution time (by too much) though. Is that untrue? And of course you will probably not be able to get a system hook that tells you when a directory changed which would be really efficient. Though other means of IPC can of course be used. I guess TCP is the most portable IPC solution without delay until thumbnailing proc is notified... Anyway I like your idea – yankee Feb 10 '11 at 09:23
  • @yankee: As I said, a long-lived background process is probably not a real possibility on a shared hosting (and I'm suggesting cron as a not-so-good alternative, not as the primary solution). On a professional hosting, neither is a problem (the background process wasn't PHP anyway - PHP is a very bad choice for anything that has to run for longer than tens of seconds). As for the notification, there's `inotify` on Linux, or you could just check the directory contents (which is much less efficient, but also sort-of-works in a pinch). – Piskvor left the building Feb 10 '11 at 09:30
  • I think that I can do all you said (I have a dedicated hosting). Do you think is faster to write a C script to thumbnail the images? – LuisClemente Feb 10 '11 at 09:46
  • @LuisClemente: Should be - IIRC there are C bindings for ImageMagick. In our case, a simple bash script was sufficient. – Piskvor left the building Feb 10 '11 at 09:49
  • @LuisClemente: I strongly recommend that you use a language that you are skilled in. C will be fast especially if you use imageMacgick as a library as Piskvor suggested, but if you end up with memory leaks or any other bugs in THAT script then you generate much more trouble compared to the slight speed benefit that C gives you. The "slow" thing (scaling the image), will be done by a (C)-library anyway. I disagree about php beeing a bad choice for skripts that run 10seconds+. I used php for coding and IRC bot which runs endlessly and it works very well. – yankee Feb 10 '11 at 10:06
  • @yankee: Could have been the version, I think this was done way back with PHP 5.1. Anyway, I've found PHP's memory management abysmal there. While PHP is somewhat usable in such situations, there are tools that are better suited for the job; but we're getting way off topic here. – Piskvor left the building Feb 10 '11 at 10:17
2

You can use http headers to tell the client that the output has ended after the "OK" message has been transfered, while keeping the script running on the server to process the thumbnails. I've once successfully used this code:

header("Connection: close");
@ob_end_clean();
ignore_user_abort();
ob_start();

//generate and print server response here
echo "everything OK";

$size = ob_get_length();
header("Content-Length: ".$size);
ob_end_flush();
flush();

//whatever you do here has no influence on the page loading time, as the client has already closed its connection.
generateThumbnail();
Simon
  • 3,509
  • 18
  • 21
  • Flush may not always return the output - I've had varied experiences with it, based on the load. – xil3 Feb 10 '11 at 09:33
  • Thanks for your answer, it could be a solution but I'm using Symfony and I'm not sure if I can do that. – LuisClemente Feb 10 '11 at 09:38
  • You just have to try it. xil3 is completeley right, flush() doesn't work as expected on all systems and setups (although in my case it worked quite reliably). However, this is a fail-safe approach, so if the content is not correctly flushed, everything still works as expected, the user will just have to wait longer for the success message. – Simon Feb 10 '11 at 09:45
  • Flush doesn't work on modern browsers because they wait for either a) more data or b) close-headers. So with this header this shouldn't be a problem. What I am wondering about is calling Connection: close and still outputting data after that. – gnur Feb 10 '11 at 09:49
  • "the client has already closed its connection." That's an unwarranted assumption - as @gnur points out, most browsers use keep-alive and similar techniques to increase throughput. You'd need to also set 'Connection: close', I'm not sure if you can actually break off a connection from within HTTP. I'd say you should start output buffering and then discard the buffer, for data past your last flush(). – Piskvor left the building Feb 10 '11 at 10:02
0

Suggestions:

  1. Close your connection before script terminates as outlined here: http://www.php.net/manual/en/features.connection-handling.php#71172

  2. Generate output in "real time" as explained here: How to echo output in real time, (before script finishes)? (won't work in all environments and the user will still get a loading bar until thumbnails finished to generate)

  3. Start another process that creates the thumbnails for you. If you have the according privileges on the server use system() for that. If you don't have them, create another php script on your server which you call using URL and sockets. Then you can terminate the connection early and keep the script running to generate the thumbnails. Use ignore_user_abort() to stop the thumbnail generation to abort once you abort the tcp connection opened with your socket.

Community
  • 1
  • 1
yankee
  • 38,872
  • 15
  • 103
  • 162
0

There is actually a way, If you send back the response headers and the response content you can actually keep the server thread going with your processing without keeping the client going. So you can send the headers back with the content type etc.. or a header redirect etc.. the browser will read the response. But that does not mean the server thread stops.,

You can use the connection status, the ignore user abort and keep the same thread going in the server here you can see this explained.

sample from the link provided:

<?php
ob_end_clean();
header("Connection: close\r\n");    
header("Content-Encoding: none\r\n");
ignore_user_abort(true); // optional
ob_start();
echo ('Text user will see');
$size = ob_get_length();
header("Content-Length: $size");
ob_end_flush();     // Strange behaviour, will not work
flush();            // Unless both are called !
ob_end_clean();

//do processing here
sleep(5);

echo('Text user will never see');
//do some processing
?>    
cromestant
  • 652
  • 2
  • 10
  • 21