0

I am stuck on a point where I do not want to iterate all files all the time to see if I have indexed those or not. I have following suggested solutions with 1st one implemented.

  1. Iterate all files first time. Store current date/time somewhere. Next time check all files with creation date > last stored date.
  2. Create file on same path with extension '.done' after indexing. Next time only those files will be indexed whose '.done' file will be missing.
  3. Move index files to Archived directory after indexing.

3rd solution is not recommended as I cannot change directory structure as it is being used by many other people.

Is there any other better solution to get only those files that are not indexed/visited in last iteration?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Behzad Qureshi
  • 546
  • 1
  • 7
  • 16
  • 1
    https://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher(v=vs.110).aspx – MichaelThePotato Aug 09 '16 at 12:02
  • This is already provided by the NTFS Journal feature. That's how eg antivirus or backup programs know that some files have changed. Unfortunately, .NET doesn't expose the API but [this SO question](http://stackoverflow.com/questions/10544433/how-to-get-the-nextusn-journal-entry-for-a-vss-snapshot) shows a workaround to read and parse a Journal file. The newly released .NET 4.6.2 can read volume paths so *maybe* you don't need AlphaFS – Panagiotis Kanavos Aug 09 '16 at 12:51
  • Is your problem solved with the FileSystemWatcher as suggested by the comment by MichaelThePototo? – Roland Aug 09 '16 at 12:59
  • @Roland an FSW won't detect changes made while it was down. It doesn't notify when a file is *closed* either, so that processing can begin – Panagiotis Kanavos Aug 09 '16 at 13:02
  • True, that is why I thought it might be worthwhile to suggest a full directory scan. But why doesn't OP clarify his current needs, if any? – Roland Aug 09 '16 at 13:05
  • @Roland the OP explained exactly what he wants to do. If you need to process a lot of files, you *don't* want to process unchanged files. You don't want to lose changes made while the application was down either. And NTFS already addresses this issue through the Change Journal. – Panagiotis Kanavos Aug 09 '16 at 13:07
  • I think @MichaelThePotato 's solution will work. I will implement that and notify here once done. Thanks everyone. – Behzad Qureshi Aug 09 '16 at 13:17

1 Answers1

0

Process newest files first, until you get to files already done, or creation time less than time of previous directory scan:

DirectoryInfo DirInfo = new DirectoryInfo(@"c:\");
foreach (FileInfo fi in DirInfo.GetFiles( ...)
                          .OrderByDescending(p => p.CreationTime)
                          .Where(p => p.CreationTime > time-of-last-scan)
                          .ToList())
{
  ...
}                         
Roland
  • 4,619
  • 7
  • 49
  • 81
  • As the OP said, he doesn't want to iterate all files. – Panagiotis Kanavos Aug 09 '16 at 12:52
  • I just try to suggest a solution to his problem. By the way, his option 2 also needs to read the names of ALL files. – Roland Aug 09 '16 at 12:58
  • Instead of `GetFiles` you could use `EnumerateFiles` to avoid reading everything before returning. Ordering isn't necessary. If the result needs to be ordered, the call should come *after* `Where` otherwise it will read and cache all entries. – Panagiotis Kanavos Aug 09 '16 at 13:05