1

This question is the next step in my project of Optimizing a folder/directory of images and making sure not to re-optimize an image that has already been optimized without moving the optimized image to a new directory. I had a question posted here How to avoid Optimizing images that are already optimized with PHP?

My application is a Command line tool written in PHP that wraps around several Image Optimization tools and recursively scans a supplied directory and runs ALL images through the appropriate optimizer program.

Right now if I had a folder which contained 200 images and I ran my application, it would try to optimize all 200 images which is very time consuming. So say I add another 5 images to my project and I need to run the application again, it would now optimize all 205 images instead of just the 5 new images. That is the problem I am trying to solve. My idea is to have some sort of file/DB that sits in the image folders and when my application is ran, it will first read in all images that are in this file, if the image is in this file then it has already been optimized and we do not need to re-optimize it. So in my example only the 5 new images would be ran as they would not be in the file. Once completed, these 5 new images would be merged into this file so on the 3rd run of my application all 205 images would be skipped as they have already been optimized and added to the file/DB.

I think this would be a pretty good method, performance wise but I am open to all ideas for a better way or even improvements to my current idea?

Some issue though to think about, since my application runs a supplied directory recursively, if I have a folder that has 20 images and 8 other sub-folders and these all have images and some of these sub-folders have sub-folders as well, then all the images under the root supplied folder are ran, regardless of how deep it is in the hierarchy. So I am not sure if I would need to have a separate file/DB under each sub-folder that list the files processed already under each sub-folder or if 1 file/DB under the root folder would be good enough as long as it had the file path relative to the root folder.

Other thoughts are, if an image is deleted, do I need to somehow remove it from this file/DB or should I just not bother as it will likely not affect anything anyways?

My main question here, does this sound like a good solution? A file or Database type file in the image folder to list the files already optimized? If you think it is, how should I go about creating this file, should it be a simple text file with 1 image file with path on each line, or an XML file or what kind of file, PHP will likely have to read this files content into memory as an array and then when recursively scanning my directories of images it will have to compare each image it finds with this Image List array to verify if it is already in the processed list or not and then build a list of un-processed files and then process just those files

Please share your thoughts, suggestions, critism, etc, I have never done anything like this

Community
  • 1
  • 1
JasonDavis
  • 48,204
  • 100
  • 318
  • 537

1 Answers1

1

It is a good idea and in general terms called cataloguing.

Couple of things I will do is to

  • add "Last modified date" to you db table that stores Images so that If for any reason file is overwritten then it can be optimized again
  • second you dont need individual db file in each subfolder I will store possibly the directory or the path hash in th db table itself so that I know if that has been optimized or not

Also I can do another thing which could be a overkill but manageable.

Say you have your images in a folder lets call it "incoming" or "databucket" when you end up optimizing your images you can then move them to another folder called say "outgoing" or "filesystem" and deleting them from incoming.

Jaspreet Chahal
  • 2,759
  • 1
  • 15
  • 17
  • Thanks for the answer, I like your ideas. The only part that I don't think would be good in my situation is the moving of files and only because this would break the hard-linked images in my websites. Which type of file would you use to store this information, a regular text type file or something more like an XML file? – JasonDavis Feb 20 '12 at 04:42
  • well thats why I said it could be an overkill for this simple task. but yeah cataloguing is best done using db and put as much info so that it can help you when you need it. – Jaspreet Chahal Feb 20 '12 at 05:05
  • I see well since I am using PHP, perhaps a SQLite DB would be a good choice so I can keep the actual DB file in the image directory – JasonDavis Feb 20 '12 at 05:31