-1

Since I don't like to use software already on market to teach myself in new techniques I'm developing a tool looking for duplicates of files based on their hashes.

Reading the file entries from a path is not the problem but hashing the files takes it's amount of time.

Does NTFS natively support a per file checksum which I can use?

Since my lag of knowledge of NTFS internally I don't know which search terms to use. ntfs+checksum+file is widely useless.

codekandis
  • 712
  • 1
  • 11
  • 22
  • Please see ["Should questions include “tags” in their titles?"](http://meta.stackexchange.com/questions/19190/should-questions-include-tags-in-their-titles), where the consensus is "no, they should not"! –  Dec 21 '15 at 13:19
  • First Google hit for "ntfs file checksum" pointed me to [Getting a file checksum directly from the filesystem instead of calculating it explicitly](http://stackoverflow.com/questions/7812258/getting-a-file-checksum-directly-from-the-filesystem-instead-of-calculating-it-e), which in turn points to the duplicate [There is in Windows file systems a pre computed hash for each file?](http://stackoverflow.com/questions/1490384/there-is-in-windows-file-systems-a-pre-computed-hash-for-each-file). – CodeCaster Dec 21 '15 at 13:22
  • Never underestimate the google search bubble. I already don't get this hit. So thank you. – codekandis Dec 21 '15 at 15:53

1 Answers1

2

No, there is no hashes in NTFS. File writes will become very slow if any change on e.g. 10MB file requires hash recalc.

i486
  • 6,491
  • 4
  • 24
  • 41
  • That would be true for a hash like md5 or sha1, but not for a checksum. You can edit a few bytes in the middle of a file and calculate a new checksum for it without having to read any of the other data in a file. – Kef Schecter Dec 11 '20 at 02:04
  • @KefSchecter Yes but such checksum will not be reliable enough for "a tool looking for duplicates of files". It will give too many false-duplicates. – i486 Dec 11 '20 at 15:17
  • Oh, whoops. I missed that part. But even a plain checksum would be useful for finding file duplicates if it's only used as a first step; first compare checksums and only compare hashes if the checksums match. – Kef Schecter Dec 12 '20 at 03:32