4

I'm building a web based system, which will host loads and loads of highres images, and they will be available for sale. Of course I will never display the highres image, instead when browsing people will only see a low resolution, watermarked image. Currently the workflow is as follows:

PHP script handles the highres image upload, when image is uploaded, it's automatically re-sized to a low res image and to a thumbnail image as well and both of the files are saved on the server, (no watermark is added).

When people are browsing, the page displays the thumbnail of the image, on click, it enlarges and displays the lowres image with watermark as well. At the time being I apply the watermark on the fly whenever the lowres image is opened.

My question is, what is the correct way:

1) Should I save a 2nd copy of the lowres image with thumbnail, only when it's access for the first time? I mean if somebody access the image, I add the watermark on the fly, then display the image & store it on the server. Next time the same image is accessed if a watermarked copy exist just display the wm copy, otherwise apply watermark on the fly. (in case watermark.png is changed, just delete the watermarked images and they will be recreated as accessed).

2) Should I keep applying watermarks on the fly like I'm doing now.

My biggest question is how big is the difference between a PHP file_exists(), and adding a watermark to an image, something like:

$image = new Imagick();
$image->readImage($workfolder.$event . DIRECTORY_SEPARATOR . $cat . DIRECTORY_SEPARATOR .$mit);
$watermark = new Imagick();
$watermark->readImage($workfolder.$event . DIRECTORY_SEPARATOR . "hires" . DIRECTORY_SEPARATOR ."WATERMARK.PNG");
$image->compositeImage($watermark, imagick::COMPOSITE_OVER, 0, 0);

All lowres images are 1024x1024, JPG with a quality setting of 45%, and all unnecessary filters removed, so the file size of a lowres image is about 40Kb-80Kb.

It is somehow related to this question, just the scale and the scenarios is a bit different.

I'm on a dedicated server (Xeon E3-1245v2) cpu, 32 GB ram, 2 TB storage), the site does not have a big traffic overall, but it has HUGE spikes from time to time. When images are released we get a few thousand hits per hours with people browsing trough the images, downloading, purchasing, etc. So while on normal usage I'm sure that generating on the fly is the right approach, I'm a bit worried about the spike period.

Need to mention that I'm using ImageMagick library for image processing, not GD.

Thanks for your input.

UPDATE

None of the answers where a full complete solution, but that is good since I never looked for that. It was a hard decision which one to accept and whom to accord the bounty.

@Ambroise-Maupate solution is good, but yet it's relay on the PHP to do the job.

@Hugo Delsing propose to use the web server for serving cached files, lowering the calls to PHP script, which will mean less resources used, on the other hand it's not really storage friendly.

I will use a mixed-merge solution of the 2 answers, relaying on a CRON job to remove the garbage.

Thanks for the directions.

Community
  • 1
  • 1
Emil Borconi
  • 3,326
  • 2
  • 24
  • 40
  • I think that in the case of 'spike' period, saving the watermarked image can be a good idea. On the other side, this will increase a lot the space you use on your hard disk (and in that case, 2TB can became "small"). I suggest to use an approach base on "number of view": in your DB, store the number of time each img is viewed and then, save only the watermaked version of the most "popular". Depending on evolution of your site, you'll be able to increase or decrease the number of "saved watermarked img". – Peter Jan 24 '15 at 19:19
  • The problem with this is, that we store photos of events (muddy, sport, etc). When an event is over, people will start looking like anything for the images creating the "spikes", after that it cools of. The next wave of "spikes" will be a different event so it will have no relation to previous images, so till my counter hits the target and I start saving the images, the spike is already in cool of.... if that make sense... – Emil Borconi Jan 24 '15 at 19:22
  • 1
    You should definitely be caching them instead of generating them each time. You could cache them in a CDN layer, if you don't have enough storage space locally. – Jeremiah Winsley Jan 24 '15 at 19:27
  • OK Emil, I understand. So in that case, just save only the watermaked versions of the last event. Also, you can "prepare" to save time: when you put images of a new event, delete the watermarked ones of the previous event, create all the premarked versions of the new img and then open the store with the new event. – Peter Jan 24 '15 at 19:33

4 Answers4

5

I would suggest you to create watermarked images on-the-fly and to cache them at the same time as everybody suggested.

Then you could create a garbage-collector PHP script that will be executed every days (using cron). This script will browse your cache folder to read every image access time. This can done using fileatime() PHP method. Then when a cached wm image has not been accessed within 24 or 48 hours, just delete it.

With this method, you can handle spike periods as images are cached at the first request. AND you will save your HDD space as your garbage-collector script will delete unused images for you.

This method will only work if your server partition has atime updates enabled.

See http://php.net/manual/en/function.fileatime.php

  • I'm curious, if you cache the image, how will you be able to track when it was last accessed? – Hugo Delsing Jan 30 '15 at 08:22
  • If you cache the image, your application must serve directly this image from Apache or Nginx by updating your HTML view. Then each time a file is open or used in a process, its `atime` must be updated (if your disk partition enables `atime`). If you don’t like this low-level IO process, you always can implement this mecanism in your database. Each time an image is requested, you can update you database’s `last_read` field and your *garbage-collector* script will read from database instead of file’s `atime`. – Ambroise Maupate Jan 30 '15 at 08:59
  • `atime` is not available on windows right, only linux? `atime` would be perfect indeed to handle garbage. Storing something in the database would be bad in my opinion, because you are still doing more then just serving images. So you still need `PHP` to handle images. – Hugo Delsing Jan 30 '15 at 09:33
  • I don’t know, I’m only working with Debian and MacOS systems. Some people use `clearstatcache()` to refresh IO stats. – Ambroise Maupate Jan 30 '15 at 09:44
  • database is out of discussion. That will require even more resources, I think that is a very bad idea. On the other hand low-level IO, it's a very nice idea. – Emil Borconi Jan 31 '15 at 10:06
5

Personally I would create a static/cookieless subdomain in a CDN kinda way to handle these kind of images. The main reasons are:

  1. Images are only created once
  2. Only accessed images are created
  3. Once created, an image is served from cache and is a lot faster.

The first step would be to create a website on a subdomain that points to an empty folder. Use the settings for IIS/Apache or whatever to disable sessions for this new website. Also set some long caching headers on the site, because the content shouldn't change

The second step would be to create an .htaccess file containing the following.

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)   /create.php?path=$1  [L]

This will make sure that if somebody would access an existing image, it will show the image directly without PHP interfering. Every non-existing request will be handled by the create.php script, which is the next thing you should add.

<?php
function NotFound()
{
    if (!headers_sent()) {
        $protocol = (isset($_SERVER['SERVER_PROTOCOL']) ? $_SERVER['SERVER_PROTOCOL'] : 'HTTP/1.0');
        header($protocol . ' 404 Not Found');
        echo '<h1>Not Found</h1>';
        exit;
    }
}

$p = $_GET['path'];

//has path
if (strlen($p)<=1)
    NotFound();

$clean = explode('?', $p);
$clean = explode('#', $clean[0]);
$params = explode('/', substr($clean[0], 1)); //drop first /

//I use a check for two, because I dont allow images in the root folder
//I also use the path to determine how it should look
//EG: thumb/125/90/imagecode.jpg
if (count($params)<2)
    NotFound();

$type = $params[0];

//I use the type to handle different methods. For this example I only used the full sized image
//You could use the same to handle thumbnails or cropped/watermarked
switch ($type) {
    //case "crop":if (Crop($params)) return; else break;
    //case "thumb":if (Thumb($params)) return; else break;
    case "image":if (Image($params)) return; else break;
}
NotFound();
?>
<?php
/*
Just some example to show how you could create a responds
Since you already know how to create thumbs, I'm not going into details

Array
(
    [0] => image
    [1] => imagecode.JPG
)
*/
function Image($params) {
    $tmp = explode('.', $params[1]);
    if (count($tmp)!=2)
        return false;

    $code = $tmp[0];


    //WARNING!! SQL INJECTION
    //USE PROPER DB METHODS TO GET REALPATH, THIS IS JUST EXAMPLE
    $query = "SELECT realpath FROM images WHERE Code='".$code."'";
    //exec query here to $row
    $realpath = $row['realpath'];


    $f = file_get_contents($realpath);

    if (strlen($f)<=0)
        return false;

    //create folder structure
    @mkdir($params[0]);

    //if you had more folders, continue creating the structure
    //@mkdir($params[0].'/'.$params[1]);

    //store the image, so a second request won't access this script
    file_put_contents($params[0].'/'.$params[1], $f);

    //you could directly optimize the image for web to make it even better
    //optimizeImage($params[0].'/'.$params[1]);

    //now serve the file to the browser, because even the first request needs to show the image
    $finfo = finfo_open(FILEINFO_MIME_TYPE);
    header('Content-Type: '.finfo_file($finfo, $params[0].'/'.$params[1]));

    echo $f;

    return true;
}
?>
Hugo Delsing
  • 13,803
  • 5
  • 45
  • 72
  • While it is an interesting approach I have some objections against it. The first one being, I do not want to disclose the files names, and your solution... will do that. Second you answered making assumption I'm going to use Apache, which is not the case. I think @ambrosie-maupete answer is a better suite. However have to admit the idea of "skipping" PHP it is a very good one indeed. – Emil Borconi Jan 26 '15 at 19:07
  • I didn't assume you use Apache, I even mentioned IIS. Works the same way. I'm sure other web servers have similar options. Also the file name is very easy to bypass. If you send a code instead of param 1 and 2 (or even 3,4,5,etc) you can get the actual filename just as you do now. The main point is that you need to call an `URL` that points to an actual image, and only serve PHP content when it's not available yet. So as long as it ends in `.jpg`, `.png` or another image extention, the file name can be anything – Hugo Delsing Jan 26 '15 at 19:39
  • @EmilBorconi, just for the sake of argument I altered the code to use a code instead of the real name. – Hugo Delsing Jan 27 '15 at 07:43
1

For most scenarios, lazily applying the watermark would probably make most sense (generate the watermarked image on the fly when requested then cache the result) however if you have big spikes in demand you are creating a mechanism to DOS yourself: create the watermarked version on upload.

symcbean
  • 47,736
  • 6
  • 59
  • 94
  • The problem with that is that the image used for watermarking can be changed anytime. This can be a good starting point, but will need further work. – Emil Borconi Jan 31 '15 at 09:48
0

Considering your HDD storage capacity and Pikes.

I would only create a watermarked image if it is viewed.(so yes on the fly) In that way you dont use to much space with a bunch a files that are or might not be viewed.

I would not watermark thumbnails i would rather make a filter that fake watermark and protect from being saved. That filter would apply to all thumbnails without creating a second image.

In this way all your thumbbails are watermarked (Fake with onther element on top).

Then if one of these thumbnails is viewed it generate a watermarked image (only once) since after its generated you load the new watermarked image.

This would be the most efficient way to deal with your HDD storage and Pikes.

The other option would be to upgrade your hosting services. Godaddy offer unlimited storage and bandwith for about 50$ a year.

MadeInDreams
  • 1,991
  • 5
  • 33
  • 64