0

Since this quesiton yielded no useful answers and some comments please see my other question that got spot on anwer: How exactly "Everything Search" can give me immediately searchable list of 2bln files on my 4TB HDD in less than 10 seconds?

Only way I know of reading directories is to recursively descend into each directory but that's too slow if I want to find file anywhere on the whole disk quickly.

There's windows program "Everything Search" http://www.voidtools.com/ that does that faster than I assume is possible by recursive descent (it reads filenames of almost 2bln files on 4TB HDD in less than 10 seconds).

I know I could build index ahead of time but can it be done just by reading whole directory tree of a disk into ram in one operation and parsed there?

EDIT

Since my question proved to be confusing here's what I want to do:

https://msdn.microsoft.com/en-us/library/07wt70x2(v=vs.110).aspx?cs-save-lang=1&cs-lang=cpp#code-snippet-1

// For Directory::GetFiles and Directory::GetDirectories 
// For File::Exists, Directory::Exists 
using namespace System;
using namespace System::IO;
using namespace System::Collections;

// Insert logic for processing found files here. 
void ProcessFile( String^ path )
{
   Console::WriteLine( "Processed file '{0}'.", path );
}


// Process all files in the directory passed in, recurse on any directories  
// that are found, and process the files they contain. 
void ProcessDirectory( String^ targetDirectory )
{

   // Process the list of files found in the directory. 
   array<String^>^fileEntries = Directory::GetFiles( targetDirectory );
   IEnumerator^ files = fileEntries->GetEnumerator();
   while ( files->MoveNext() )
   {
      String^ fileName = safe_cast<String^>(files->Current);
      ProcessFile( fileName );
   }


   // Recurse into subdirectories of this directory. 
   array<String^>^subdirectoryEntries = Directory::GetDirectories( targetDirectory );
   IEnumerator^ dirs = subdirectoryEntries->GetEnumerator();
   while ( dirs->MoveNext() )
   {
      String^ subdirectory = safe_cast<String^>(dirs->Current);
      ProcessDirectory( subdirectory );
   }
}

int main( int argc, char *argv[] )
{
   for ( int i = 1; i < argc; i++ )
   {
      String^ path = gcnew String(argv[ i ]);
      if ( File::Exists( path ) )
      {

         // This path is a file
         ProcessFile( path );
      }
      else 
      if ( Directory::Exists( path ) )
      {

         // This path is a directory
         ProcessDirectory( path );
      }
      else
      {
         Console::WriteLine( "{0} is not a valid file or directory.", path );
      }

   }
}

I want to get same information but without calling Directory::GetDirectories multiple times. Solution I'm looking for doesn't look anyway like this code. This code is just illustration of what information I want read of the disk (names of all files in directories) not how I'd like to do it (I don't want recursion and as many system calls as there are directories).

EDIT 2 (For the people that consider this question too broad):

I'm asking how to do that either on Windows or Linux operating systems. I'll accept answer in any language because I'm most interested what system calls I need to make (and how to parse the results of those calls) to get NTFS directory tree of a whole drive into RAM with fewer system calls than one per each directory on the drive.

I'll also accept the answer that points me to Windows or Linux library that does exactly that.

Community
  • 1
  • 1
Kamil Szot
  • 17,436
  • 6
  • 62
  • 65
  • 4
    You should add some more langauge and OS tags. How about Cobol, BASIC, AmigaOS, iOS and DOS? – too honest for this site Sep 13 '15 at 15:19
  • Operating systems routinely cache directory/filename information. There has to be a reasonable limit to the cache size or retention time. NAME_CACHE is how network drivers handle this problem: https://msdn.microsoft.com/en-us/library/windows/hardware/ff550866%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396 – jim mcnamara Sep 13 '15 at 15:43
  • 1
    If you have enough RAM what's to stop you? I'm not sure I follow the question. – Byte Lab Sep 13 '15 at 15:45
  • @Decave My question is, can I read it with one system call, not one call per each directory on the drive? – Kamil Szot Sep 13 '15 at 16:15
  • How would you know when to reload the data you have in RAM without reading what's on disk? – Andrew Henle Sep 13 '15 at 16:16
  • 1
    @KamilSzot A system call can't magically change the fact that the file system directory tree has to be transversed in order to be brought into memory. – Ross Ridge Sep 13 '15 at 16:17
  • @Olaf I'm limited to 5 tags. On more serious note I don't care about other os-es (not even mac os), as for other languages, sure, if you know how to do it in Rust I'll gladly accept your answer. I added c++ and c tags because I have a feeling that if it's possible then it's something hacky or just low level, like reading some blocks directly from the disk and parsing them without bothering os and it's filesystem too much. I feel that c and c++ people could have some knowledge about such stuff. – Kamil Szot Sep 13 '15 at 16:19
  • @AndrewHenle I don't want to reload data in ram. I want to read information about contents of directories (just file names) once, in bulk. My only condition is that I shouldn't have to as os about each directory separately. – Kamil Szot Sep 13 '15 at 16:22
  • 1
    @KamilSzot: Reading the directory of a structure will require accessing platform specific API (if the OS lets you). Be aware that there is possibility that the directory structure may be a linked list like a B-Tree so reading the entire tree may not be possible with one disk access. – Thomas Matthews Sep 13 '15 at 16:25
  • @ThomasMatthews Thank you for the first relevant comment I got. I'm aware that this might be a problem. I'm hoping to find someone here that knows how to do it in fewer than as many calls as directories. – Kamil Szot Sep 13 '15 at 16:32
  • @RossRidge I'm not asking for magic. I'm asking how Everything Search can give me immediately searchable list of 2bln files on my 4TB HDD in less than 10 seconds. – Kamil Szot Sep 13 '15 at 16:59
  • 1
    If that's your question then you should ask that question. Instead your question assumes that it uses a magic system call, and asks what system call is. – Ross Ridge Sep 13 '15 at 17:03
  • 1
    Everything directly reads an NTFS index structure, which is much faster then traversing the tree. That's also the reason why it requires admin privileges to index NTFS volumes. Unfortunately I don't know more details like the name of that index or the API functions/IOCTL codes/... by which it's accessed. – Paul Groke Sep 13 '15 at 17:03
  • 1
    Similar programs that do the same thing (read and interprete the MFT) exist and are open source. NTFS Search, Swift search, you name them. A 20 second search on Google would have found them. – Damon Sep 13 '15 at 17:17
  • @RossRidge Didn't go so well. Two downvotes already. http://stackoverflow.com/questions/32552353/how-exactly-everything-search-can-give-me-immediately-searchable-list-of-2bln – Kamil Szot Sep 13 '15 at 17:19
  • @Damon If you turn your comment into an answer, I'll accept it. Event better if it comes with links pointing to the source code. http://sourceforge.net/p/swiftsearch/code/ci/master/tree/ http://sourceforge.net/projects/ntfs-search/files/src/ – Kamil Szot Sep 13 '15 at 17:32
  • @Damon You could also mention function https://msdn.microsoft.com/en-us/library/windows/desktop/aa363216(v=vs.85).aspx that's used for talking to NTFS driver directly. If this question gets reopened.... – Kamil Szot Sep 13 '15 at 17:50
  • 1
    [This might help.](http://stackoverflow.com/a/7459109/886887) – Harry Johnston Sep 13 '15 at 21:50

1 Answers1

0

if you have sufficient amount of RAM then go ahead and do it. But I recommend you to load disk in parts instead of whole disk that will also improve your time efficiency.

incompetent
  • 1,715
  • 18
  • 29
  • My question is how to do it? How to read whole directory tree, of a 4TB drive that probably has millions of files in few seconds (as "Everything Search" http://www.voidtools.com/ does). – Kamil Szot Sep 13 '15 at 16:28
  • yep i read your comments above which clear your problem. my answer does not help in that regard. as even if you divide the load still your question need answer – incompetent Sep 13 '15 at 16:33
  • in your given code framework there is no way around to get rid of that call recursion. – incompetent Sep 13 '15 at 16:55
  • please ignore the framework, I gave the code as example of what I want to achieve (get all filenames into RAM) not how to achieve it (multiple system calls, recursion). – Kamil Szot Sep 13 '15 at 17:02
  • i could not know the solution but get rid of traversal of tree it can not be fast enough. – incompetent Sep 13 '15 at 17:11