5

I am in the middle of writing a tool that finds lost files of an iTunes library, for both Mac and Windows. On the Mac, I can quickly find files by naming using the wonderful "CatalogSearch" function.

On Windows, however, there seems to be no OS API for searching by file name (or is there?).

After some googling, I learned that there are tools (like TFind, Everything) that read the NTFS directory directly and scan it to find files by name.

I would like to do the same, but without having to start from scratch (although I've written quite a few disk tools in the past, I've never had the energy to dig into NTFS).

I wonder if there are ready-made libs around, possibly as a .dll, that would give me this search feature: Pass in a file name, get back its path.

Alternatively, what about the Windows indexing service? At least when I tried this on a recently installed XP Home system, the Search operation under the Start menu would actually scan all directories, which suggests that it has no complete database. As I'm not a Windows user at all, I wonder why this isn't working.

In the end, the complete solution I need is: I have a list of file names to find, and I need code that searches the entire disk (or uses a DB for it) to get me all results in one go. E.g, the search should not start a new full scan for every file I'm looking up. That's why I think the MFT way would be optimal, as it could quickly iterate over all names, comparing each to my list.

Roman R.
  • 68,205
  • 6
  • 94
  • 158
Thomas Tempelmann
  • 11,045
  • 8
  • 74
  • 149
  • Windows Search is quick only if you're searching indexed locations. – MSalters Nov 22 '10 at 09:48
  • I guess you mean this: http://msdn.microsoft.com/en-us/library/bb266517(v=VS.85).aspx?ppud=4 -- looks complicated. I'll give it a closer look, thanks. – Thomas Tempelmann Nov 22 '10 at 12:18
  • Do not do this, please please please. Listen to the guy who tells you to use the USN Journal – Ana Betts Nov 22 '10 at 18:10
  • Alright. You persuaded me. Now, you'd even convince me if you'd tell me why the Windows Search is not such a good idea. Maybe because it won't find everything? (mind you, I'm the author of "Find Any File" for OS X, in case you ever need to find _everything_ on a Mac :) – Thomas Tempelmann Nov 22 '10 at 23:03

1 Answers1

6

The best way to solve your problem seems to be by using the Windows Change Journal.

Problem: If it is not enabled for a volume or the volume is a non-NTFS you need a fallback (or enable the Change Journal if it is NTFS). You need administrator rights as well to access the Change Journal.

You get the files by using the FSCTL_ENUM_USN_DATA and DeviceIOControll with LowUsn=0. This directly accesses the MFT and writes all filenames into the supplied buffer. Because it sequentially acesses the MFT it is faster than the FindFirstFile API.

UrOni
  • 431
  • 4
  • 9
  • Yes, I am aware of this option (it's also available on OS X by default since 10.5). But that's too complicated to handle, I fear. – Thomas Tempelmann Nov 22 '10 at 12:12
  • And the change journal only gives me the recent changes, right? So if I do not keep a process running recording every change, I will still have to do a full scan first. Correct? Then I'm back to my original question: How do I do a fast full scan? – Thomas Tempelmann Nov 22 '10 at 12:15
  • 1
    If you set StartUSN to zero as described this gives you all files on the volume in a fast way (And it is really fast). If you want changes you have to set StartUSN to a higher number. Then you get the changed files since that USN. – UrOni Nov 22 '10 at 17:56
  • Sorry. It is FSCTL_ENUM_USN_DATA and not FSCTL_QUERY_USN_JOURNAL - my bad. – UrOni Nov 22 '10 at 18:06
  • Ah, then the "journal" actually does more than just journalling it seems (contrary to OS X's function which only tells you of changes while listening). Thanks for clarifying. I'll look into this then. I'm all about using the options available, while resorting to slow processes otherwise. – Thomas Tempelmann Nov 22 '10 at 23:01
  • 1
    I don't think you need the Change Journal enabled to use `FSCTL_ENUM_USN_DATA`. There's a separate ioctl for change tracking, `FSCTL_READ_USN_JOURNAL`, which is probably more similar to the OSX journal you've used before, although the NTFS one is more like a closed-caption security tape: your process doesn't have to be running when the change occurs as long as you query the journal before it wraps around and gets overwritten. – Ben Voigt Dec 26 '10 at 05:02
  • Ben: That's a "closed-circuit" security tape. Closed captioning is optional subtitles on TV show. Anyway, I belive that `FSCTL_ENUM_USN_DATA` walks the MFT, returning USN records of matching files, while `FSCTL_READ_USN_JOURNAL` walks the USN journal, returning matching files (and possibly waiting until new records show up). – Gabe Dec 26 '10 at 05:50
  • @Gabe: Yes, I meant "closed-circuit". That sometimes happens when I post comments too late at night. – Ben Voigt Dec 30 '10 at 06:33
  • +1 for this great information. And here is the link to the MSDN documentation on FSCTL_ENUM_USN_DATA: http://msdn.microsoft.com/en-us/library/aa364563%28VS.85%29.aspx – Helge Klein Jan 09 '11 at 20:29
  • Have a look at these links: http://www.microsoft.com/msj/0999/journal/journal.aspx and http://technet.microsoft.com/en-us/library/bb742450.aspx. This 2 part series named "Keeping an Eye on Your NTFS Drives: the Windows 2000 Change Journal Explained" helped me kept my sanity when implementing change journal functionality. – Hannes de Jager Feb 21 '11 at 15:40