13

I want to search a file duplicate by its hash. For performance purposes I want to know if there is a stored hash/checksum for each file in NTFS/FAT file systems. If there is, I don't have to compute them all to search my file.

If there is, how to access it using .NET?

If it helps, it will be JPEG files. Do they have a checksum?

Glenn Slayden
  • 17,543
  • 3
  • 114
  • 108
Jader Dias
  • 88,211
  • 155
  • 421
  • 625

3 Answers3

10

There is no such thing.

nobody
  • 19,814
  • 17
  • 56
  • 77
  • 2
    Windows allows random writes to a file. Could you imagine the overhead if each write required recomputing the file's checksum? – Mark Ransom Sep 29 '09 at 03:14
  • I imagine that at least EXE files have a checksum, as other types may have. – Jader Dias Sep 29 '09 at 03:18
  • 4
    @MarkRansom You could imagine it to be computed only when it's requested, and cached somewhere, with the only thing happening each time the file is written being cache invalidation - far less costly than recomputing it each time – Evren Kuzucuoglu Jan 14 '13 at 11:23
  • 3
    Also, a hash for error detection doesn't need to be cryptographically secure: it's OK to use some kind of cyclical pattern like plain addition or XOR; something where a few changed blocks can be compensated for without recomputing the complete hash. – Eamon Nerbonne Nov 01 '13 at 23:44
8

Windows does not store a hash for each file. As Jader Dias suggests, there are checksums for EXE's and DLL's but these are not the droids you are looking for.

Note that even if you had such a hash, it still does not guarantee uniqueness. If you found two files with the same hash (and size) you would still have to then compare contents to determine if the files were truly the same.

JPEG files may have some checksums or hashes, but you probably cannot count on them either.

Foredecker
  • 7,395
  • 4
  • 29
  • 30
  • 1
    +1 for "Note that even if you had such a hash, it still does not guarantee uniqueness." ... although it's true that very small changes *almost always* result in a unique hash, users have a way of producing those magical edge-case conditions. – overslacked Sep 29 '09 at 06:48
2

Windows though does have search now & if I recall correctly you can write your own plugins for it (in other words, to index files in a custom way). Presumably, you could write a plugin for JPGs & then simply make search API calls to find files (after Windows does the indexing).

Vitali
  • 3,411
  • 2
  • 24
  • 25
  • 1
    I think Windows indexes text (as filenames), not images. – Jader Dias Oct 02 '09 at 02:16
  • 2
    From MSDN: The content indexed is based on the file and data types supported through add-ins... filters included in Window Search support over 200 common types of data including ... plain-text files, HTML, and many more. Sure, while it only natively supports certain files, as it says, you can index anything with a custom plugin. Certainly search can index MP3s - JPGs would be no different. – Vitali Oct 05 '09 at 00:39