14

Can the ext4 filesystem detect data corruption of file contents? If yes, is it enabled by default and how can I check for corrupted data?

I have read that ext4 maintains checksums for file metadata and its journal, but I was unable to find any information on checksums for the actual file contents.

For clarity: I want to know if a file has changed since the last write operation.

Lukas Boersma
  • 1,022
  • 8
  • 26

2 Answers2

14

No, ext4 doesn't and can't detect file content corruption.

Well known file systems implementing silent data corruption detection and therefore able to correct it when enough redundancy is available are ZFS and btrfs.

They do it by computing and storing a CRC for every data block written and by checking the CRC or each data block read. Should the CRC doesn't match the data, the latter is not provided to the caller and either RAID allows for an alternate block to be used instead, or an I/O error is reported.

The reading process will never receive corrupted data, either it is correct or the read fails.

jlliagre
  • 29,783
  • 6
  • 61
  • 72
  • Raid-2 did this, but is now obsolete, ZFS is implemented in Sunmicro Solaris and BTRFS is a choice in Linux (see https://en.wikipedia.org/wiki/Btrfs) – jobeard Jul 16 '15 at 15:14
  • @jobeard Solaris ZFS is developed by Oracle now. There is also a ZFS fork from which are built ZFS implementations for OpenSolaris (illumos) derivatives, FreeBSD, Linux, and OS X. – jlliagre Jul 17 '15 at 10:13
  • Metadata CRC checking is the only thing implemented now. – Konrad Gajewski Dec 07 '17 at 00:52
  • @KonradGajewski Yes, and that doesn't prevent file content corruption. By writing "now", do you mean ext4 data CRC checking is planned? – jlliagre Dec 07 '17 at 01:26
5

"Can the ext4 filesystem detect data corruption of file contents?" Not in the sense you are expecting. It performs journaling, creating a boolean {before vs after} copy to ensure io completion.

A CRC / checksum is a test for modification from a known state and although the CRC or checksum may not compare to the original, that does not imply that the file is then "corrupt" (aka invalid) - - it only says it has been changed. Strictly speaking, one form of "corruption" would be to alter the 'magic number' at the beginning of a file, like changing %PDF to %xYz - - that would make the content unusable to any program.

"... to know if a file has changed since the last write operation". Systems that track mtime() will do so uniformly, so every write will modify mtime() making your request impossible.

The only way mtime() would not reflect last write io would be media degredation.

jobeard
  • 129
  • 6
  • 1
    Thanks. To make sure I understood you: Let's say I write a file, then shut down my computer, and for some reason the file contents change (because of media degredation, environmental effects, or something like that). After booting again, there is no way to detect the change? – Lukas Boersma Jul 10 '15 at 17:31
  • 3
    @jobeard ext4 doesn't compute nor store CRCs on data blocks. The OP is asking about silent data corruption, you seem to be describing voluntary data corruption (eg: %xYz). This is a different story. Media degradation is a common cause of silent data corruption but not the only one that can occur. The data portion of a file can be accidentally or intentionally overwritten either without the file mtime to be affected (raw access) or having the mtime being reset afterwards. – jlliagre Jul 11 '15 at 22:56
  • I did not say ext4 did crc, but made the point that crc was a means to detect alteration, but alteration was not the same as 'corruption'.. Also made the point that 'corruption' comes in different flavors and the alteration of the magic number would be an example of corruption making the file 'invalid'. As to ' intentionally overwritten', that's entirely outside the question IMO, although true. – jobeard Jul 13 '15 at 14:49
  • Question is about file contents, not metadata, and corruption by the data being altered by something other than the OS or anything using syscalls on the mounted FS. – binki Nov 26 '16 at 03:44