1

I am currently working on a PHP application which is ran from the command line to optimize a folder of Images.

The PHP application is more of a wrapper for other Image Optimizer's and it simply iterates the directory and grabs all the images, it then runs the Image through the appropriate program to get the best result.

Below are the Programs that I will be using and what each will be used for...

imagemagick to determine file type and convert non-animated gif's to png
gifsicle to optimize Animated Gif images
jpegtran to optimize jpg images
pngcrush to optimize png images
pngquant to optimize png images to png8 format
pngout to optimize png images to png8 format

My problem: With 1-10 images, everything runs smooth and fairly fast however, once I run on a larger folder with 10 or more images, it becomes really slow. I do not really see a good solution around this but one thing that would help is to avoid re-processing images that have already been Optimized. So if I have a folder with 100 images and I optimize that folder and then add 5 new images, re-run the optimizer. It then has to optimize 105 images, my goal is to have it only optimize the 5 newer images since the previous 100 would have already been optimized. This alone would greatly improve performance when new images are added to the image folder.

I realize the simple solution would be to simply copy or move the images to a new folder after processing them, my problem with that simple solution is that these images are used for the web and websites, so the images are generally hard-linked into a websites source code and changing the path to the images would complicate that and possibly break it sometimes.

Some ideas I have had are: Write some kind of text file database to the image folders that will list all the images that have already been processed, so when the application is ran, it will only run on images that are not in that file already. Another idea was to cheange the file name to have some kind of identification in the name to show it has been optimized, a third idea is to move each optimized file to a final destination folder once it is optimized. Idea 2 and 3 are not good though because they will break all image path links in the websites source code.

So please if you can think of a decent/good solution to this problem, please share?

JasonDavis
  • 48,204
  • 100
  • 318
  • 537

4 Answers4

3

Meta data
You could put a flag in the meta info of each image after it is optimized. First check for that flag and only proceed if it's not there. You can use exif_read_data() to read the data. Writing it maybe like this.

The above is for JPGs. Metdata for PNGs is also possible take a look at this question, and this one.

I'm not sure about GIFs, but you could definitely convert them to PNGs and then add metadata... although I'm pretty sure they have their own meta info, since meta data extraction tools allow GIFs.

Database Support
Another solution would be to store information about the images in a MySQL database. This way, as you tweak your optimizations you could keep track of when and which optimization was tried on which image. You could pick which images to optimize according to any parameters of your choosing. You could build an admin panel for this. This method would allow easy experimentation.

You could also combine the above two methods.

Maximum File Size
Since this is for saving space, you could have the program only work on images that are larger than a certain file size. Ideally, after running the compressor once, all the images would be below this file size, and after that only newly added images that are too big would be touched. I don't know how practical this is in terms of implementation, since it would require that the compressor gets any image below some arbitrary files size. You could make the maximum file size dependent on image size.....

Community
  • 1
  • 1
Peter Ajtai
  • 56,972
  • 13
  • 121
  • 140
  • This was actually my first idea but I beleive this would only work for jpg images? Most all of my web images will be png and gif or at least converted to these in most cases, also some of the optimizations I think actually remove the exif data to save space when it exist. I am not 100% sure on what I just said so I could be wrong but if I ma right, that is why I don't think that would work. Also I would like to avoid a Database unless it was some sort of single file DB to list the files that can be contained in the folder of images – JasonDavis Feb 20 '12 at 02:25
  • @jason - PNGs also have metadata. I added two links to the answer. – Peter Ajtai Feb 20 '12 at 02:39
  • Thanks for the follow up, after more research I am kind of leaning towards having some sort of Database/file. Preferably a file that sits in the image folder that can list the images that have been processed already and then just run on the non-optimized images, at the end of the run it would update this list file to include the image files that were just processed. My main reason is if there are hundreds of files, it would have to read in parts of all these files to get the exif and metadata and I think a file/db might be a little faster. Any thoughts on this approach? – JasonDavis Feb 20 '12 at 03:39
  • I am hoping to avoid using MySQL also as the end product is something I would like to let other users use and it should be something that doesn't need a MySQL connection, I should be able to simply supply a image folder path to the command line tool and it will process the images so any kind of file or database will need to be available to any kind of user and self contained in the image folder. I think this might be suitable for a new question to be started, more specific to keeping a running catalog/db/file of files in a directory so I will choose your answer as it has the most votes – JasonDavis Feb 20 '12 at 03:42
1

The easiest way would most likely be to look at the time of the last change for each image. If an image was changed after the last run of your script, you have to run it on this particular image. The timestamp when the script was ran could be saved easily in a short text file.

s1lence
  • 2,188
  • 2
  • 16
  • 34
  • I like the idea of having a simple text file in the folder with the last run date/time and checking each image against that +1 – JasonDavis Feb 20 '12 at 02:18
0

Sorry this is late, but since there is a way to address this issue without creating any files, storing any data of any kind or keeping track of anything. I thought I'd share my solution of how I address things like this.

Goal
Setup an idempotent solution that efficiently optimizes images without dependencies that require keeping track of its current status.

Why
This allows for a truly portable solution that can work in a new environment, an environment that somehow lost its tracker, or an environment that is sensitive as to what files you can actually save in there.

Diagnose
Although metadata might be the first source you'd think to check for this information, it's true that in some cases it will not be available and the nature of metadata itself is arbitrary, like comments, they can come and go and not affect the image in any way. We want something more concrete, something that is a definite descriptor of the asset at hand. Ideally you would want to "identify" if one has been optimized or not, and the way to do that is to review the image to see if it has been based on its characteristics.

Strategy
When you optimize an image, you are providing different options of all sorts in order to reach the final state of optimization. These are the very traits you will also check to come to the conclusion of whether or not it had been in fact optimized.

Example
Lets say we have a function in our script called optimize(path = ''), and let's assume that part of our optimization does the following:

$ convert /path/to/image.jpg -bit-depth=8 -quality=87% -colors=255 -colorspace sRGB ...

Note that these options are ones that you choose to specify, they will be applied to the image and are properties that can be reviewed later...

$ identify -verbose /path/to/image.jpg
Image: /path/to/image.jpg
  Format: JPEG (Joint Photographic Experts Group JFIF format)
  Mime type: image/jpeg
  Geometry: 1250x703+0+0
  Colorspace: sRGB <<<<<<
  Depth: 8-bit <<<<<<
  Channel depth:
    Red: 8-bit
    Green: 8-bit
    Blue: 8-bit
  Channel statistics:
    Pixels: 878750
    Red:
        ...
    Green:
        ...
    Blue:
      ...
  Image statistics:
    Overall:
      ...
  Rendering intent: Perceptual
  Gamma: 0.454545
  Transparent color: none
  Interlace: JPEG
  Compose: Over
  Page geometry: 1250x703+0+0
  Dispose: Undefined
  Iterations: 0
  Compression: JPEG
  Quality: 87 <<<<<<
  Properties:
    ...
  Artifacts:
    ...
  Number pixels: 878750

As you can see here, the output quite literally has everything I would want to know to determine whether or not I should optimize this image or not, and it costs nothing in terms of a performance hit.

Conclusion
When you are iterating through a list of files in a folder, you can do so as many times as you like without worrying about over optimizing the images or keeping track of anything. You would simply filter out all the extensions you don't want to optimize (eg .bmp, .jpg, .png) then check their stats to see if they possess the attributes your function will apply to the image in the first place. If it has the same values, skip, if not, optimize.

Advanced
If you want to get extremely efficient, you would check each attribute of the image that you plan on optimizing and in your optimization execution you would only apply the options that have not been applied to the command.

Note
This technique is obviously meant to show an example of how you can accurately determine whether or not an image needs to be optimized. The actual options I have listed above are not the complete scope of elements that can be chosen. The are a variety of available options to choose from, and you can apply and check for as many as you want.

0

A thought that comes to my head is to mix the simple solution with a more complicated one. When you optimize the image, move it to a separate folder. When an access is made into the original image folder, have your .htaccess file capture those links and route them to an area of which can see if that same image exists within the optimized folder section, if not, optimize, move, then proceed.

I know i said simple solution, this is a sightly complicated solution, but the nice part is that the solution will provide a scalable approach to your issue.


Edit: One more thing

I like the idea of a MySQL database because you can add a level security (not all images can be viewed by everyone) If thats a need of course. But it also makes your links problem (the hard coded one) not so much a problem. Since all links are a single file of which retrieves the images from the db and the only thing that changes are get variables which are generated. This way your project becomes significantly more scalable and easier to do a design change.

ThePrimeagen
  • 4,462
  • 4
  • 31
  • 44
  • 1
    Not a horrible idea with the .htaccess redirect, but in the long run wouldn't this kind of possibly hurt server performance? As far as using MySQL DB to store images or even image paths doesn't seem like a good idea in my situation. I plan to use this on every website I build so I couldn't see having a DB just for the image like that, just doesn't seem like a great idea performance wise and that's the whole purpose of me optimizing images to get the max performance I can – JasonDavis Feb 20 '12 at 02:40
  • If there is no security requirements, then your just fine. Just have a good privacy policy. – ThePrimeagen Feb 20 '12 at 03:51