0

I have a PHP application that uploads and stores files (think Imgur). The way it works now is all files go into one main /storage directory. This is all fine and good, unless of course you want to do something inside that directory, a simple ls usually crashes my terminal. This hasn't been an issue yet other than rsync taking a while to build a file list, but I want to plan for the future.

Would it be smarter to store uploads in a /year/month/file.ext directory or is using a single directory scalable going forward? One problem with this would be that there are already millions of links out there that link directly to site.com/storage/file.ext which would need to be redirected to the new location - what is the proper way of doing this without hammering mysql every time an image is requested?

Josh Mountain
  • 1,888
  • 10
  • 34
  • 51

2 Answers2

4

A common way to handle a large number of files is to break them up into subdirectories, as you've surmised. But rather that dividing them up by date, which would require knowing some metadata about the file, break them up by filename.

For example, if the filename is abcdefg.jpg, store it as the path /storage/a/b/c/abcdefg.jpg. The exact number of subdirectories depends on how much you want to scale this.

At the top level, create 26 subdirectories a-z. Below that, 26 more in each subdirectory. And below that, 26 in each sub-subdirectory. You can have a script do this for you.

Then move each image into the appropriate sub-sub-sub directory. Extract the first three characters of each filename and build the complete path out of them. Again, a program can do this for you. If you want to keep your site live while you do this, use hard links (assuming a Unix-like system) and delete the original files once the migration is done.

Finally, to map the old links to the new locations, use mod_rewrite of Apache2 (assuming you are using that server). In fact, you never have to expose the actual paths to the files in your links; just let mod_rewrite do the work for you.

RewriteRule ^/storage/(([a-z])([a-z])([a-z]).*)$ /storage/$2/$3/$4/$1
Barry Brown
  • 20,233
  • 15
  • 69
  • 105
0

Forming a more specific pattern of organization sounds wise.. it at least makes it easier to manage your massive collection of image files, and may open up some doors for batch backups and other operational scripts.

Could you possibly write a migration script that programmatically goes through and finds references to your files, and replaces them with a new path? Or are there currently a large amounts of external (outside of your control) references to your files?

You might also be able to programmatically generate a redirect map for your web server that sends requests for the old paths to the new path.

Patrick Coffey
  • 1,073
  • 8
  • 13