204

How does Windows with NTFS perform with large volumes of files and directories?

Is there any guidance around limits of files or directories you can place in a single directory before you run into performance problems or other issues?

E.g. is having a folder with 100,000 folders inside of it an OK thing to do?

mikemaccana
  • 110,530
  • 99
  • 389
  • 494
James Newton-King
  • 48,174
  • 24
  • 109
  • 130
  • 4
    [How do you deal with lots of small files?](http://stackoverflow.com/q/115882/365102) – Mateen Ulhaq May 28 '11 at 21:50
  • The answers at the related question is inferior to the accepted answer here. – Eric J. Oct 29 '14 at 22:05
  • This Implementation might be usefull : [github.com/acrobit/AcroFS](http://github.com/acrobit/AcroFS) – Ghominejad Dec 22 '17 at 13:53
  • 1
    Related: [does ReFS handle large amounts of files, large deletes faster than NTFS?](https://stackoverflow.com/questions/50600934/does-refs-handle-small-files-and-large-deletes-faster-than-ntfs) – mikemaccana May 30 '18 at 10:40

8 Answers8

297

Here's some advice from someone with an environment where we have folders containing tens of millions of files.

  1. A folder stores the index information (links to child files & child folder) in an index file. This file will get very large when you have a lot of children. Note that it doesn't distinguish between a child that's a folder and a child that's a file. The only difference really is the content of that child is either the child's folder index or the child's file data. Note: I am simplifying this somewhat but this gets the point across.
  2. The index file will get fragmented. When it gets too fragmented, you will be unable to add files to that folder. This is because there is a limit on the # of fragments that's allowed. It's by design. I've confirmed it with Microsoft in a support incident call. So although the theoretical limit to the number of files that you can have in a folder is several billions, good luck when you start hitting tens of million of files as you will hit the fragmentation limitation first.
  3. It's not all bad however. You can use the tool: contig.exe to defragment this index. It will not reduce the size of the index (which can reach up to several Gigs for tens of million of files) but you can reduce the # of fragments. Note: The Disk Defragment tool will NOT defrag the folder's index. It will defrag file data. Only the contig.exe tool will defrag the index. FYI: You can also use that to defrag an individual file's data.
  4. If you DO defrag, don't wait until you hit the max # of fragment limit. I have a folder where I cannot defrag because I've waited until it's too late. My next test is to try to move some files out of that folder into another folder to see if I could defrag it then. If this fails, then what I would have to do is 1) create a new folder. 2) move a batch of files to the new folder. 3) defrag the new folder. repeat #2 & #3 until this is done and then 4) remove the old folder and rename the new folder to match the old.

To answer your question more directly: If you're looking at 100K entries, no worries. Go knock yourself out. If you're looking at tens of millions of entries, then either:

a) Make plans to sub-divide them into sub-folders (e.g., lets say you have 100M files. It's better to store them in 1000 folders so that you only have 100,000 files per folder than to store them into 1 big folder. This will create 1000 folder indices instead of a single big one that's more likely to hit the max # of fragments limit or

b) Make plans to run contig.exe on a regular basis to keep your big folder's index defragmented.

Read below only if you're bored.

The actual limit isn't on the # of fragment, but on the number of records of the data segment that stores the pointers to the fragment.

So what you have is a data segment that stores pointers to the fragments of the directory data. The directory data stores information about the sub-directories & sub-files that the directory supposedly stored. Actually, a directory doesn't "store" anything. It's just a tracking and presentation feature that presents the illusion of hierarchy to the user since the storage medium itself is linear.

nulltoken
  • 64,429
  • 20
  • 138
  • 130
MrB
  • 4,405
  • 5
  • 27
  • 21
  • 7
    Where can I find more information about `contig.exe`, it isn't on my server. A Google search returned [this technet page](http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx) which has no mention of subdirectories or folder index defragmentation. – Evan Carroll Jun 25 '10 at 17:25
  • 41
    I found out about contig & folder index fragmentation from a tech call with a Microsoft engineer. It was a huge pain in the butt going through their useless level 1-3 layers of tech support. (Uh...have you tried running chkdsk? Can you try opening the folder in Windows Explorer? Can you check the folder permissions?) FOOL! I'm not going to sit here for 7 days waiting for your damn chkdsk to scan a drive with tens of millions of files!! – MrB Jun 26 '10 at 04:07
  • The contig tool doesn't mention any command line switches for defragmenting the indexes, only the files. Do need to defragment every file in the directory to also defragment the indexes? – ss2k Mar 25 '11 at 19:16
  • 7
    @ss2k - Just point `contig.exe` to a directory, I *think* that will do the job: `contig -a .` gives: `C:\temp\viele-Dateien is in 411 fragments Summary: Number of files processed : 1 Average fragmentation : 411 frags/file` – Lumi Aug 25 '11 at 10:37
  • Also, if you find that you need to run contig against folders where the drive is a mount point (as it won't work with one), you can simply tack on an additional drive letter in Diskmgmt for that disk, then run contig per Lumi's comment above. – Quantum Elf Mar 02 '14 at 01:30
  • Afaik since Vista there are some mechanisms that should avoid some of the worst fragmentations. (though not all). – Marco van de Voort May 07 '15 at 09:56
  • 2
    Is this still an issue with SSD disks? I'll have to make a folder with a huge number of shortcuts inside (around 6 mils). I tried contig.exe on another smaller folder and I do see it very fragmented (1075 fragments) but contig won't defrag it. – GPhilo Jun 26 '17 at 08:21
  • 6
    @GPhilo I can confirm performance still degrades on an SSD when using millions of files. I as well tried to defrag the folder, but contig didn't do anything to it. It acted as if it completed but showed the same fragmentation before and after running it. – Bram Vanroy Sep 06 '17 at 14:19
  • @mrb 'If you DO defrag, don't wait until you hit the max # of fragment limit.' is confusing. The current wording implies defrag is optional, and the the consideration is for after you have decided to defrag - which I'm fairly sure is wrong. Would it be better to to read 'if you think you may need to defrag, don't wait until you hit the maximum number of fragments'? – mikemaccana May 30 '18 at 11:21
  • 1
    In terms of running Contig to defrag the index, should I run contig on `c:\my\big\directory`, or `c:\my\big\directory\*`, or on `$mft` ? (or something else?) – Stephen R Jun 27 '18 at 19:55
  • (Regarding @Lumi 's sorta-answer above, when I point it at a directory it appears to scan each individual file in the directory. So the answer remains unclear) – Stephen R Jun 27 '18 at 20:09
  • Does defragmenting NTFS metadata with `contig` impacts the live system and how long it usually runs? Talking about ~8 million files taking 8TB of space. – Janis Veinbergs Apr 17 '20 at 07:21
53

There are also performance problems with short file name creation slowing things down. Microsoft recommends turning off short filename creation if you have more than 300k files in a folder [1]. The less unique the first 6 characters are, the more of a problem this is.

[1] How NTFS Works from http://technet.microsoft.com, search for "300,000"

Tony Lee
  • 5,622
  • 1
  • 28
  • 45
  • 7
    I'd add a quote here *`If you use large numbers of files in an NTFS folder (300,000 or more), disable short-file name generation for better performance, and especially if the first six characters of the long file names are similar.`* -- spares the search for "300,000" hint. BTW: typing in "300" will be sufficient (= no need for clipboarding here) – Wolf Apr 19 '17 at 10:55
41

I am building a File-Structure to host up to 2 billion (2^32) files and performed the following tests that show a sharp drop in Navigate + Read Performance at about 250 Files or 120 Directories per NTFS Directory on a Solid State Drive (SSD):

  • The File Performance drops by 50% between 250 and 1000 Files.
  • The Directory Performance drops by 60% between 120 and 1000 Directories.
  • Values for Numbers > 1000 remain relatively stable

Interestingly the Number of Directories and Files do NOT significantly interfere.

So the Lessons are:

  • File Numbers above 250 cost a Factor of 2
  • Directories above 120 cost a Factor of 2.5
  • The File-Explorer in Windows 7 can handle large #Files or #Dirs, but Usability is still bad.
  • Introducing Sub-Directories is not expensive

This is the Data (2 Measurements for each File and Directory):

(FOPS = File Operations per Second)
(DOPS = Directory Operations per Second)

#Files  lg(#)   FOPS    FOPS2   DOPS    DOPS2
   10   1.00    16692   16692   16421   16312
  100   2.00    16425   15943   15738   16031
  120   2.08    15716   16024   15878   16122
  130   2.11    15883   16124   14328   14347
  160   2.20    15978   16184   11325   11128
  200   2.30    16364   16052   9866    9678
  210   2.32    16143   15977   9348    9547
  220   2.34    16290   15909   9094    9038
  230   2.36    16048   15930   9010    9094
  240   2.38    15096   15725   8654    9143
  250   2.40    15453   15548   8872    8472
  260   2.41    14454   15053   8577    8720
  300   2.48    12565   13245   8368    8361
  400   2.60    11159   11462   7671    7574
  500   2.70    10536   10560   7149    7331
 1000   3.00    9092    9509    6569    6693
 2000   3.30    8797    8810    6375    6292
10000   4.00    8084    8228    6210    6194
20000   4.30    8049    8343    5536    6100
50000   4.70    7468    7607    5364    5365

And this is the Test Code:

[TestCase(50000, false, Result = 50000)]
[TestCase(50000, true, Result = 50000)]
public static int TestDirPerformance(int numFilesInDir, bool testDirs) {
    var files = new List<string>();
    var dir = Path.GetTempPath() + "\\Sub\\" + Guid.NewGuid() + "\\";
    Directory.CreateDirectory(dir);
    Console.WriteLine("prepare...");
    const string FILE_NAME = "\\file.txt";
    for (int i = 0; i < numFilesInDir; i++) {
        string filename = dir + Guid.NewGuid();
        if (testDirs) {
            var dirName = filename + "D";
            Directory.CreateDirectory(dirName);
            using (File.Create(dirName + FILE_NAME)) { }
        } else {
            using (File.Create(filename)) { }
        }
        files.Add(filename);
    }
    //Adding 1000 Directories didn't change File Performance
    /*for (int i = 0; i < 1000; i++) {
        string filename = dir + Guid.NewGuid();
        Directory.CreateDirectory(filename + "D");
    }*/
    Console.WriteLine("measure...");
    var r = new Random();
    var sw = new Stopwatch();
    sw.Start();
    int len = 0;
    int count = 0;
    while (sw.ElapsedMilliseconds < 5000) {
        string filename = files[r.Next(files.Count)];
        string text = File.ReadAllText(testDirs ? filename + "D" + FILE_NAME : filename);
        len += text.Length;
        count++;
    }
    Console.WriteLine("{0} File Ops/sec ", count / 5);
    return numFilesInDir; 
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
Spoc
  • 668
  • 5
  • 14
  • 7
    You see performance loss after 2^8 files because you need to disable short name generation (8 character name generation). See https://technet.microsoft.com/en-us/library/cc781134(v=ws.10).aspx – Kyle Falconer Jun 15 '15 at 18:26
  • 1
    Hi, I tried that using this Command Line: fsutil.exe behavior set disable8dot3 1 After a reboot the results were largely the same for less than 10000 files/dirs. The article says it is important only for higher numbers. What I saw though was a general perf. degradation possibly due to the higher load factor on my SSD (it is 80% full now instead of 45%) – Spoc Oct 25 '15 at 08:32
  • very useful, thanks. Estimations of millions said by other users are far from this numerical values. – Adrian Maire Jan 10 '17 at 15:49
  • 2
    Even after disabling 8.3 name generation, you still need to **strip** the existing 8.3 names, or there will be little improvement to the enumeration of existing files. – Stephen R Jun 27 '18 at 19:26
  • 3
    more details: https://blogs.technet.microsoft.com/josebda/2012/11/13/windows-server-2012-file-server-tip-disable-8-3-naming-and-strip-those-short-names-too/ – Stephen R Jun 27 '18 at 19:53
  • 3
    NTFS stores directories as B-trees. Those points where you see sharp changes in performance are simply when the B-tree gets one level deeper due to growth. These points can vary depending on file name length (because NTFS tries to fit as many entries in each 4K B-tree node as space will allow, and file name length determines the size of each entry), and also if short names are enabled (because then NTFS may have to add two entries per file instead of just one). – Craig Barkhouse Apr 29 '20 at 02:15
16

100,000 should be fine.

I have (anecdotally) seen people having problems with many millions of files and I have had problems myself with Explorer just not having a clue how to count past 60-something thousand files, but NTFS should be good for the volumes you're talking.

In case you're wondering, the technical (and I hope theoretical) maximum number of files is: 4,294,967,295

Oli
  • 235,628
  • 64
  • 220
  • 299
8

For local access, large numbers of directories/files doesn't seem to be an issue. However, if you're accessing it across a network, there's a noticeable performance hit after a few hundred (especially when accessed from Vista machines (XP to Windows Server w/NTFS seemed to run much faster in that regard)).

Brian Knoblauch
  • 20,639
  • 15
  • 57
  • 92
2

When you create a folder with N entries, you create a list of N items at file-system level. This list is a system-wide shared data structure. If you then start modifying this list continuously by adding/removing entries, I expect at least some lock contention over shared data. This contention - theoretically - can negatively affect performance.

For read-only scenarios I can't imagine any reason for performance degradation of directories with large number of entries.

Constantin
  • 27,478
  • 10
  • 60
  • 79
2

I had real experience with about 100 000 files (each several MBs) on NTFS in a directory while copying one online library.

It takes about 15 minutes to open the directory with Explorer or 7-zip.

Writing site copy with winhttrack will always get stuck after some time. It dealt also with directory, containing about 1 000 000 files. I think the worst thing is that the MFT can only by traversed sequentially.

Opening the same under ext2fsd on ext3 gave almost the same timing. Probably moving to reiserfs (not reiser4fs) can help.

Trying to avoid this situation is probably the best.

For your own programs using blobs w/o any fs could be beneficial. That's the way Facebook does for storing photos.

phuclv
  • 37,963
  • 15
  • 156
  • 475
ximik
  • 31
  • 4
  • I'm not sure where do you get that "the MFT can only by traversed sequentially"? The MFT contains a B-tree and is traversed like a B-tree – phuclv Aug 15 '18 at 15:07
0

Beyond NTFS, the server hosting the file system and the client using the file system [remotely] can also make a difference to how NTFS behaves and performs. Clients usually use the SMB protocol to access network shares. Each version of Windows Server and Client can behave differently.

Beyond that, SMB itself can be tuned. As a starting point, refer to

Performance tuning for file servers | Microsoft Learn https://learn.microsoft.com/en-us/windows-server/administration/performance-tuning/role/file-server/