0

I have a Linux based application (Drupal - PHP + Apache + MySQL) that is quickly scaling to hold thousands of pictures in the same folder. I am probably close to 2,000 right now. Are there any drawbacks of having so many images in a single folder? Could it represent an impact in terms of performance?

I am not planning to browse files in that folder and the server is simply serving the images when their URL is requested, but I wonder if I will have problems in the future (the application is scaling, scaling in terms of number of pictures and it could go as far as 20,000, 30,000 images...). Maybe I should plan a strategy for splitting this monstrous images folder into subfolders, like using usernames for subfolders, or the year-month the picture was uploaded.

In a nutshell, the questions are: is it bad to have thousands of images in the same folder for a Linux server (not sure which flavor is being used for my app, it is actually in a shared hosting environment)? Should I do something to avoid this approach and split those items into subfolders? Are there any recommendations in terms of number of files per folder, or maximum number of files per folder?

Thank you for sharing your thoughts about this.

Marcos Buarque
  • 3,318
  • 8
  • 44
  • 46

1 Answers1

1

It strongly depends on the file system (and also the mount options).

Recent file-systems (ext4, btrfs ....) are able to deal with huge directories containing a lot of files (so probably could deal with a directory with half a million files).

However, you won't be happy in the rare cases you need to fsck a multi-terabyte file system!

However, the shell (and globbing functions - read glob(7) and glob(3) etc...) may be unhappy with a directory of many thousand entries. (consider that autocomplete in an interactive shell may need to scan the directory). And the human user (e.g. some sysadmin) might be mad if ls needs a minute to answer many thousand lines....

I would recommend having no more than a few thousand files (including sub-directories) per directory. Consider organizing your images like a0/001.png ... a0/999.png a1/001.png ... a1/999.png .... b9/ etc...

And if you have really a huge lot of files, you might want to use different file systems for them (e.g. a0/ ... a9/ on one disk and b0/ ... b9/ on another one) .... or use LVM, or perhaps OpenStack Swift object storage or Cinder block storage, etc.

Look also into Nosql databases...

BTW, if you aim to scale to petabyte storage, things will become less easy....

You probably want to store in some database at least the name of the files.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Thank you, I don't think I will get to the Petabytes. If I do, by then I will have moved to something more robust. But thank you for all the hints, it is important to plan for the future. – Marcos Buarque Aug 27 '13 at 02:10