Getting a file checksum directly from the filesystem instead of calculating it explicitly

Question

I'm guessing that a typical filesystem tends to keep some kind of checksum/CRC/hash of every file it manages, so it can detect file corruption.

Is that guess correct? And if yes, is there a way to access it?

I'm primarily interested in Windows and NTFS, but comments on other platforms would be welcome as well... Language is unimportant at this point, but I'd like to avoid assembler if possible.

@HansPassant At the block level, sure. But what about file level? — Branko Dimitrijevic, Oct 18 '11 at 19:06
depending on the OS and filesystem that can be true... for example for ZFS (available for Sun, Linux and OSX)... anyway IF that is calculated/stored by the filesystem it is usually not accessible via a documented API... to get to it you usually need to dig deep and use severaly undocumented stuff which in some cases need specific permissions (Administrator, root or even a kernel module/driver)... that is usually much more trouble than just calculating your own checksum... what exactly is your goal ? — Yahia, Oct 18 '11 at 19:06
@Yahia Yup that is what I was thinking but I needed a confirmation. The goal is performance through avoiding I/O for file content if the filesystem already "accessed" that content and calculated the checksum. — Branko Dimitrijevic, Oct 18 '11 at 19:14
@BrankoDimitrijevic, that performance hit is one good reason why file systems don't try to second-guess the hardware. — Mark Ransom, Oct 18 '11 at 19:27
@MarkRansom I'm really not the expert on the subject so forgive me if I'm completely on the wrong path here... I think there is a difference between block-level and (supposed) file-level checksums. All the blocks in the file may have correct checksums, yet the file as a whole may be corrupt if some block is misplaced (e.g. data structure that holds a list of blocks did not update correctly due to a power failure). So while filesystem may not necessarily scan the contents of blocks in software, I'm guessing it would still be useful to "aggregate" block level checksums into file-level checksums. — Branko Dimitrijevic, Oct 18 '11 at 19:41
Think of the logistics - if you changed a single byte in the middle of a file, how would the file system recalculate the file checksum? At what point would the file system try to use the checksum to validate file integrity? — Mark Ransom, Oct 18 '11 at 19:45
@MarkRansom It would subtract (from the file checksum) the old block checksum and add the new one. And during "check disk" it would use it to compare filesystem data structures that "point" to the file with the file itself. I'm pulling this form my behind of course and might be on the wrong track completely... ;) — Branko Dimitrijevic, Oct 18 '11 at 20:05
You're assuming a simple additive checksum. But that additive checksum would not detect blocks out of order, which contradicts your previous comment! At any rate, NTFS does not maintain per-file checksums. It uses journalling to ensure that it doesn't lose blocks. — Raymond Chen, Oct 19 '11 at 13:12
Possible duplicate of [There is in Windows file systems a pre computed hash for each file?](http://stackoverflow.com/questions/1490384/there-is-in-windows-file-systems-a-pre-computed-hash-for-each-file) — user, Nov 18 '15 at 01:57

score 3 · Accepted Answer · edited Jul 20 '23 at 15:42

3

OK, it appears that what I'm asking is impossible.

BTW, this was also discussed here: In Windows file systems, is there a pre-computed hash for each file?

edited Jul 20 '23 at 15:42

Glenn Slayden

17,543
3
114
108

answered Oct 19 '11 at 10:09

Branko Dimitrijevic

50,809
10
93
167

score 1 · Answer 2 · answered Oct 18 '11 at 19:13

1

In the majority of filesystems and the storage hardware they would keep checksums of allocation units, not full files.

The checksums in the hardware are probably not accessible at all in general, and the checksum of the filesystem clusters would not be very useful for the great majority of cases so would be difficult to get, if possible.

answered Oct 18 '11 at 19:13

Thymine

8,775
2
35
47

It's unfortunate that there isn't a hash of some sort, as Microsoft could then optimize not replacing identical files (same timestamp and hash). – PRMan May 15 '18 at 23:49
There are filesystems that achieve that, although not working exactly how you're describing. ZFS is the one I know the most about, but in general this strategy is called `copy-on-write` – Thymine Jun 15 '18 at 18:35

Getting a file checksum directly from the filesystem instead of calculating it explicitly

2 Answers2

Linked