0

There's windows program "Everything Search" http://www.voidtools.com/ that reads file names of the NTFS volume faster than I assume is possible by recursive descent (it reads filenames of almost 2bln files on 4TB HDD in less than 10 seconds).

I know that it probably reads NTFS folder structure directly of the volume in bulk, and makes sense of it without calling OS filesystem functions.

How exactly can it be done? What system functions should I call to get that information about NTFS volume that fast and how can I parse it into file and directory names? Are there any libraries in any language that help with that?

If you are not sure what I am asking, there are more details in my previous question (I was asked to rephrase it): Can I read whole NTFS directory tree into RAM at once?

Community
  • 1
  • 1
Kamil Szot
  • 17,436
  • 6
  • 62
  • 65
  • Voting to close as too broad: the answer will be different for Windows and Linux. – Roger Lipscombe Sep 13 '15 at 17:21
  • @RogerLipscombe I added linux as an option. I'll remove it. Can you withdraw your vote? – Kamil Szot Sep 13 '15 at 17:23
  • 2
    Well, `CreateFile("$MFT", OPEN_ALWAYS, FILE_SHARE_READ, 0, 0, 0);`... not really special. It's just that you need admin privileges, and of course you must parse the binary structures therein. – Damon Sep 13 '15 at 17:28
  • @Damon Do you know any open source library that would help with parsing information read from $MFT? – Kamil Szot Sep 13 '15 at 18:01
  • The programs that I mentioned in your other question (like Swift Search or NTFS search) do that, and they are open source. You can get the complete sources at SF.net. I'd recommend you just look at how they do it. – Damon Sep 13 '15 at 18:07
  • @Damon Thank you. Exactly what I was looking for. Why do you avoid answers and respond in comments instead? – Kamil Szot Sep 13 '15 at 18:15
  • 1
    Well, I don't think "Google for 20 seconds and download the source code from SF", which is basically what I've told you in both questions, are particularly good answers :-) – Damon Sep 13 '15 at 18:38
  • @Damon Regardless, you've helped me best way I could imagine (apart from pasting full implementation). Especially considering that no other user contributed any information useful to me. Thank you. Surely there are questions asked here that don't even require that much googling but I see you have plenty of rep so you can be picky about how you earn it. :-) – Kamil Szot Sep 13 '15 at 20:24

1 Answers1

2

The NTFS volume has a low-visiblity structure it relies on called the master file table. There are APIs for querying this table directly, but they require some privileges to invoke, because you have to get a handle to the volume. The main function to query the master file table is DeviceIOControl and the control code is FSCTL_ENUM_USN_DATA

The control code appears to be a USN-related code - which is a touch misleading in this particular case - but it will give the basic flavor of the call and related structures. You get back an enumeration of records that look like usn records, but they're thin wrappers around master file table entries.

The records each have FileName, IDs and parent IDs. The FileNames are the "local" name of the file or folder, and to get the full name, you would expect to traverse the table structure.

It is lightning fast - way faster than recursing through the file system. You'll get back (and will have to filter out) things that aren't exposed in any of the normal file APIs - things you definitely don't want to expose to users, for example.

Clay
  • 4,999
  • 1
  • 28
  • 45
  • 1
    See http://stackoverflow.com/questions/21661798/how-do-we-access-mft-through-c-sharp. It's c#, but easy to "untranslate" :-) – Clay Sep 18 '15 at 16:59