31

I'm writing an app that needs to store lots of files up to approx 10 million.

They are presently named with a UUID and are going to be around 4MB each but always the same size. Reading and writing from/to these files will always be sequential.

2 main questions I am seeking answers for:

1) Which filesystem would be best for this. XFS or ext4? 2) Would it be necessary to store the files beneath subdirectories in order to reduce the numbers of files within a single directory?

For question 2, I note that people have attempted to discover the XFS limit for number of files you can store in a single directory and haven't found the limit which exceeds millions. They noted no performance problems. What about under ext4?

Googling around with people doing similar things, some people suggested storing the inode number as a link to the file instead of the filename for performance (this is in a database index. which I'm also using). However, I don't see a usable API for opening the file by inode number. That seemed to be more of a suggestion for improving performance under ext3 which I am not intending to use by the way.

What are the ext4 and XFS limits? What performance benefits are there from one over the other and could you see a reason to use ext4 over XFS in my case?

hookenz
  • 36,432
  • 45
  • 177
  • 286

2 Answers2

21

You should definitely store the files in subdirectories.

EXT4 and XFS both use efficient lookup methods for file names, but if you ever need to run tools over the directories such as ls or find you will be very glad to have the files in manageable chunks of 1,000 - 10,000 files.

The inode number thing is to improve the sequential access performance of the EXT filesystems. The metadata is stored in inodes and if you access these inodes out of order then the metadata accesses are randomized. By reading your files in inode order you make the metadata access sequential too.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
  • 1
    With the inode number thing, how would I open file by inode? I can then avoid using an expensive stat operation right? – hookenz Feb 17 '11 at 02:41
  • 5
    @Matt There is no way to open a file by inode (it would bypass part of the Unix access control scheme). But `readdir` tells you the inode numbers, so you sort your list of file names by inode number and open them in that order. BTW, "`stat` is expensive" is an oversimplification; the more accurate statement is "`stat(f);open(f)` is somewhat more expensive than "`h=open(f);fstat(h)`". (The expensive operation that you avoid doing twice in the latter case is *pathname processing*, not disk access. The differential used to be 2x but should be much less with modern systems.) – zwol Feb 17 '11 at 02:58
  • @Zack - Thanks for the very useful insite comparing stat/open vs open/fstat – hookenz Feb 17 '11 at 06:04
  • 1
    so XFS or EXT4 for 100 million? – Toolkit Apr 24 '18 at 11:51
  • @Toolkit 100 million is impossible for EXT4 which I have found directly. Both tempfs using /dev/shm as well as EXT4 will around 10 million +/- 500,000 begin to run so slowly that applications begin to time-out because they think that the disk is broken but it's not broken. It behaves exactly same for spinning disk, SSD, and RAM-backed /dev/shm using tempfs. It leads me to expect XFS on linux will behave same as these other 3, ie, failure around 10.5 million. Very high inodes values showed zero improvement. – Geoffrey Anderson Dec 18 '18 at 16:03
  • @GeoffreyAnderson we are at 122 M currently with XFS – Toolkit Dec 19 '18 at 14:32
12

Modern filesystems will let you store 10 million files all in the same directory if you like. But tools (ls and its friends) will not work well.

I'd recommend putting a single level of directories, a fixed number, perhaps 1,000 directories, and putting the files in there (10,000 files is tolerable to the shell, and "ls").

I've seen systems which create many levels of directories, this is truly unnecessary and increases inode consumption and makes traversal slower.

10M files should not really be a problem either, unless you need to do bulk operations on them.

I expect you will need to prune old files, but something like "tmpwatch" will probably work just fine with 10M files.

MarkR
  • 62,604
  • 14
  • 116
  • 151
  • Thanks, is mkdir a slow operation? Should I pre-make the directories at startup and from then on assume they exist? – hookenz Feb 17 '11 at 02:15
  • Once you get into the millions of files in a same directory, `ext4` starts to struggle and gets index hash collisions. – steve Aug 11 '15 at 18:01
  • >Modern filesystems will let you store 10 million files all in the same directory if "you like. But tools (ls and its friends) will not work well." Actually it's worse than that. The system itself, not just ls and command line capacity, begins to break down from extreme latency at 10.5 million, and this is true regafrdless of of any kind of storage (tempfs, ssd, spinning disk), and despite sufficiently high inodes values. – Geoffrey Anderson Dec 18 '18 at 16:07
  • @GeoffreyAnderson Interesting, what do you mean by extreme latency? I did some benchmark and actually found out that flat directory performs better: https://medium.com/@hartator/benchmark-deep-directory-structure-vs-flat-directory-structure-to-store-millions-of-files-on-ext4-cac1000ca28 – Hartator Dec 22 '18 at 03:39
  • @Hartator - but isn't the optimal *neither* flat nor deep; rather its "just deep enough"? That is, if getting into millions, add **one** level of subdirs - not 0 or 2 levels. – ToolmakerSteve Mar 29 '19 at 06:40