15

I am building a site that is looking at Millions of photos being uploaded easily (with 3 thumbnails each for each image uploaded) and I need to find the best method for storing all these images.

I've searched and found examples of images stored as hashes.... for example...

If I upload, coolparty.jpg, my script would convert it to an Md5 hash resulting in..

dcehwd8y4fcf42wduasdha.jpg

and that's stored in /dc/eh/wd/dcehwd8y4fcf42wduasdha.jpg but for the 3 thumbnails I don't know how to store them

QUESTIONS..

  1. Is this the correct way to store these images?

  2. How would I store thumbnails?

  3. In PHP what is example code for storing these images using the method above?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
Kenny
  • 675
  • 1
  • 7
  • 20
  • hope you have a descent spec server with lots of bandwidth. –  Sep 04 '12 at 04:42
  • 1
    What will happen if you have two users uploading a file called `coolparty.jpg`? Do you need to store a user's name as part of the filename somewhere? – andrewsi Sep 04 '12 at 04:59
  • 1
    For all those bad-mouthing data bases because they are "slow" -- without numbers to back that up, it's just hot air. Yes, data bases can be slow. File systems can also be slow. (Try putting a million--or even 50,000--images in one directory and watch your file access times skyrocket.) As for data bases, [here's an actual study](http://www.onlineaspect.com/2007/07/10/image-storage-database-or-file-system/) that argues for using data bases. Also, see [this thread](http://webmasters.stackexchange.com/questions/940/serving-images-out-of-sql-server-vs-file-system-vs-s3-etc) on webmasters. – Ted Hopp Sep 04 '12 at 04:59
  • This Implementation might be usefull : github.com/acrobit/AcroFS – Ghominejad Dec 22 '17 at 13:47

5 Answers5

11

How am I using the folder structure:

  • I'm uploading the photo, and move it like you said:

    $image = md5_file($_FILES['image']['tmp_name']);
    // you can add a random number to the file name just to make sure your images will be "unique"
    $image = md5(mt_rand().$image);
    $folder = $image[0]."/".$image[1]."/".$image[2]."/";
    
    // IMAGES_PATH is a constant stored in my global config
    define('IMAGES_PATH', '/path/to/my/images/');
    // coolparty = f3d40fc20a86e4bf8ab717a6166a02d4
    $folder = IMAGES_PATH.$folder.'f3d40fc20a86e4bf8ab717a6166a02d4.jpg';
    // thumbnail, I just append the t_ before image name
    $folder = IMAGES_PATH.$folder.'t_f3d40fc20a86e4bf8ab717a6166a02d4.jpg';
    // move_uploaded_file(), with thumbnail after process
    // also make sure you create the folders in mkdir() before you move them
    
  • I do believe is the base way, of course you can change the folder structure to a more deep one, like you said, with 2 characters if you will have millions of images.

Mihai Iorga
  • 39,330
  • 16
  • 106
  • 107
  • 1
    Remember to check directory exists (is_directory), possibly check permissions (is_writable), use mkdir and chmod to create the missing directories etc. I also suggest you chmod the uploaded files to make sure you can manipulate using ftp (if Apache/IIS/Tomcat and FTP are running as different users, as they usually are). There's a lot missing from this answer you need to worry about! – Robbie Sep 04 '12 at 05:09
  • @mihai you say, "yes, this is the best way unless you are storing them in an cloud service." I am on a cloud service, Softlayer, why wouldnt this be the best wy if you are on a cloud service? – Kenny Sep 04 '12 at 05:16
  • im on softlayer.com 's servers so no im not using my own server – Kenny Sep 04 '12 at 05:19
  • well im storing all my data and hosting everything on softlayer but i dont know if thats the answer you are looking for. they say that they do cloud hosting so yeah... i just signed up w/ them so i dont know to much – Kenny Sep 04 '12 at 05:25
  • You could use a Uuid from begin, instead of hashing the file and adding a random number. I cannot see the advantage of a hash here, because the probability of a collision is the same for a uuid or a hash of the same length. Or do you want to find duplicates of images (?), then you shouldn't add a random number. – martinstoeckli Sep 04 '12 at 07:15
7

The reason you would use a method like that is simply to reduce the total number of files per directory (inodes).

Using the method you have described (3 levels deeps) you are very unlikely to reach even hundreds of images per directory since you will have a max number of directories of almost 17MM. 16**6.

As far as your questions.

  1. Yeah, that is a fine way to store them.
  2. The way I would do it would be

    /aa/bb/cc/aabbccdddddddddddddd_thumb.jpg
    /aa/bb/cc/aabbccdddddddddddddd_large.jpg
    /aa/bb/cc/aabbccdddddddddddddd_full.jpg

    or similar

  3. There are plenty of examples on the net as far as how to actually store images. Do you have a more specific question?
sberry
  • 128,281
  • 18
  • 138
  • 165
2

If you're talking millions of photos, I would suggest you farm these off to a third party such as Amazon Web Services, more specifically for this Amazon S3. There is no limit for the number of files and, assuming you don't need to actually list the files, there is no need to separate them into directories at all (and if you do need to list, you can use different delimeters and prefixes - http://docs.amazonwebservices.com/AmazonS3/latest/dev/ListingKeysHierarchy.html). And your hosting/rereival costs will probably be lower than doing yourself - and they get backed up.

To answer more specifically, yes, split by sub directories; using your structure, you can drop the first 5 characters of the filename as you alsready have it in the directory name.

And thumbs, as suggested by aquinas, just appent _thumb1 etc to the filename. Or store in separate folders themsevles.

Robbie
  • 17,605
  • 4
  • 35
  • 72
  • ok, about the S3 service, as of right now im not gonna do that, but if i stick to this current method im contemplating about, would it be easy to transfer all the files over there if i do decide to move everything to S3? – Kenny Sep 04 '12 at 04:49
  • It would be pretty easy to write a script to move all of your files from your filesystem to s3. You could even maintain your directory structure. – sberry Sep 04 '12 at 04:51
  • +1 for mentioning an alternative. s3 is a really good service. – sberry Sep 04 '12 at 04:52
  • Yes - you could move them all over once the project is up and running. But the most complex (and annoying) thing about setting up Amazon S3 is signing up and giving credit card details :). The actual upload is a few lines of code - really, really easy. You'll spend longer worrying about mkdir and permissions and all that jazz for handling locally. Do some costings and if the project is genuinely going to be that large, start this way and save the hassles. – Robbie Sep 04 '12 at 05:05
-3

1) That's something only you can answer. Generally, I prefer to store the images in the database so you can have ONE consistent backup, but YMMV.

2) How? How about /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb1.jpg, /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb2.jpg and /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb3.jpg

3) ??? Are you asking how to write a file to the file system or...?

aquinas
  • 23,318
  • 5
  • 58
  • 81
  • Pulling images from a database and displaying them is slower than using file access. – chhameed Sep 04 '12 at 04:42
  • Storing these in a traditional sql database is not a good approach considering how many images there are. A no-sql db like Cassandra, Redis, Riak, etc. is more acceptable, but the filesystem should be the first consideration. – sberry Sep 04 '12 at 04:44
  • @Hameed - I don't think that a blanket statement like that can be justified. There are too many variables involved. (Data base caching; file system contention; distributed file systems; etc.) If any kind of locking is required, data bases are likely to be much more robust and flexible than a home-brew, file-based, locking system. – Ted Hopp Sep 04 '12 at 04:48
  • @Hameed, as I said, your mileage may vary. I didn't notice that the question was tagged MySQL. In SQL Server, there is a FileStream datatype: http://blogs.msdn.com/b/manisblog/archive/2007/10/21/filestream-data-type-sql-server-2008.aspx that is for this very purpose. – aquinas Sep 04 '12 at 04:56
  • Also, in MySQL, can you create a trigger that allows you to delete a file from the filesystem (to mimic cascading delete)? It might be possible, I just don't know how off the top of my head. – aquinas Sep 04 '12 at 05:06
-3

Improve Answer.

For millions of Images, as yes, it is correct that using database will slow down the process

The best option will be either use "Server File System" to store images and use .htaccess to add security.

or you can use web-services. many servers like provide Images Api for uploading, displaying. You can go on that option also. For example Amazon

Shail
  • 1,565
  • 11
  • 24