5

I have NTFS folders that may grow to hold 100,000 to 1,000,000 files, the upper limit discussed in this answer on NTFS performance.

My files have the following characteristics:

1) They have long file names (typically 64 to 100 characters).

2) For many of the files, the leading part of the file names can be identical for the first 20 to 40 characters.

Do long file names impact NTFS folder index performance, in either looking up a file's record from its name, fragmentation of the index, or growth of the index?

NTFS folder indexes are (reportedly) B-trees. I've tested my software to 50,000 files, but I'm running a 'happy path' test, with little file system churn. Testing to 1,000,000 will take weeks of running my software non-stop.

I've considered writing a simulator, but before I do that, does anyone have real-world experience with this?

Community
  • 1
  • 1
Pete Magsig
  • 431
  • 5
  • 18
  • Why you think testing to million would take weeks? You can create synthetic test that performs expected operations on a million of files, and it will take more to write the test than to run it. – Eugene Mayevski 'Callback Feb 19 '12 at 13:24
  • I would not think of 64 to 100 character file names as long filenames, anyway. – dmeister Feb 19 '12 at 13:30
  • From this article http://support.microsoft.com/kb/130694 it looks like performance may be hindered if you have legacy 8.3 file name support enabled. – NothingMore Feb 19 '12 at 13:32
  • @EugeneMayevski'EldoSCorp - My goal in asking this question was to avoid writing a simulator. I need to test for fragmentation as well as bloat, and the complexity of the system is such that it's not that easy to synthesize long-term systemic behavior. – Pete Magsig Feb 19 '12 at 13:43

2 Answers2

2

NTFS typically updates a file's attribute on disk if the current Last Access Time in memory differs by more than an hour from the Last Access Time stored on disk, or when all in-memory references to that file are gone, whichever is more recent. So disabling the Last Access Time improves the speed of folder and file access.

When you save a file with a long file name to an NTFS drive, NTFS creates, by default, a second file directory entry with a short file name conforming to the 8.3 convention. When NTFS enumerates files in a directory, it has to look up the 8.3 names associated with the long file names. Because an NTFS directory is maintained in a sorted state, corresponding long file names and 8.3 names are generally not next to one another in the directory listing. So, NTFS uses a linear search of the directory for every file present. As a result, the amount of time required to perform a directory listing increases with the square of the number of files in the directory. Disabling the 8.3 file creation will also improve performance.

Two registry keys needs to be changed: NtfsDisable8dot3NameCreation and NtfsDisableLastAccessUpdate, set their values to 1.

And, if you can afford it, use Solid State Drive (SSD) instead of traditional hard drive, because the performance is in magnitude better, see here http://en.wikipedia.org/wiki/Solid-state_drive#Comparison_of_SSD_with_hard_disk_drives.

Bud Damyanov
  • 30,171
  • 6
  • 44
  • 52
1

NTFS directories are BTrees with data in both the interior and leaf nodes. Since there isn't any "key prefix compression", the full text of the filename is stored in the nodes as well.

Searching this with test filenames that have lots of identical prefix characters simply wastes time since looking through each "page" of the directory does a bunch of identical comparisons before encountering the distinguishing characters. If you can make the leftmost character in the name the most variable, that'd be a huge help.

But, in the end, no filesystem is a good database and no database is a good filesystem. You need to consider the sizes of your files and expected usage characteristics.

MJZ
  • 1,074
  • 6
  • 12