How to determine whether files have been changed in a directory tree without traversing the entire tree?

Question

Imagine a directory tree (on Linux):

user@computer:~/demo> find .
.
./test1
./test1/test1_a
./test1/test1_a/somefile_1a
./test1/test1_b
./test1/test1_b/somefile_1b
./test0
./test0/test0_a
./test0/test0_a/somefile_0a
./test0/test0_b
./test0/test0_b/somefile_0b

Scenario: I determine all available meta info about every directory and file in that tree (mtime, ctime, inode, size, checksums on file contents ...), including the highest-level directory, demo. I store this information. Then, some file/s or directory/ies is/are changed (literally changed or newly created or deleted). Using the previously determined and stored information, I now want to figure out what has changed.

My solution so far: I traverse the entire tree, then look for changed meta information, then process it. Above a certain size, traversing a tree and looking at every directory and file becomes quite time consuming - even if you look at pure meta info only (i.e. ctime, mtime etc, NOT file content checksums). One can optimize such a traversal only to a certain degree (e.g. read meta info on files and folders actually only once during a traversal instead of multiple times etc) - at the end of the day I/O speed becomes the bottleneck.

Question: What options do I have (on Unix / Linux file systems) to look for changes in my tree without traversing all of it? I.e. is there any information stored for demo which tells me / indicates in some way that something below it (e.g. somefile_1b) has been changed? Are there any specific filesystems (EXT*, XFS, ZFS, ...) offering features of this kind?

Note: I am aware of the option of running a background process for monitoring changes to the filesystem. It would eliminate the need for a full traversal of my tree, though I am more interested in options which do NOT require a background monitoring process (if an option of this kind exists at all).

Possible duplicate of [how to monitor a complete directory tree for changes in Linux?](https://stackoverflow.com/questions/8699293/how-to-monitor-a-complete-directory-tree-for-changes-in-linux) — Vasan, Nov 19 '17 at 18:43
@Vasan in part, yes, though there might be file systems actually offering this as a feature ... (thanks to some behavior, through the backdoor, maybe). — s-m-e, Nov 19 '17 at 20:23
Note that using any sort of monitoring process will not detect changes made while your monitoring process is not running. So it's inherently unreliable. — Andrew Henle, Nov 19 '17 at 20:30

score 1 · Accepted Answer · answered Nov 19 '17 at 20:28

ZFS provides the capability via zfs diff ... Per the Oracle Solaris 11.2 documentation:

Identifying ZFS Snapshot Differences (zfs diff)

You can determine ZFS snapshot differences by using the zfs diff command.

For example, assume that the following two snapshots are created:
$ ls /tank/home/tim
fileA
$ zfs snapshot tank/home/tim@snap1
$ ls /tank/home/tim
fileA  fileB
$ zfs snapshot tank/home/tim@snap2
For example, to identify the differences between two snapshots, use syntax similar to the following:
$ zfs diff tank/home/tim@snap1 tank/home/tim@snap2
M       /tank/home/tim/
+       /tank/home/tim/fileB
In the output, the M indicates that the directory has been modified. The + indicates that fileB exists in the later snapshot.

The R in the following output indicates that a file in a snapshot has been renamed.
$ mv /tank/cindy/fileB /tank/cindy/fileC
$ zfs snapshot tank/cindy@snap2
$ zfs diff tank/cindy@snap1 tank/cindy@snap2
M       /tank/cindy/
R       /tank/cindy/fileB -> /tank/cindy/fileC

This does only compare between two snapshots, so you do have to have the ability to create ZFS snapshots to use this effectively.

That's interesting, thanks a lot. I did not know that one could actually diff ZFS snapshots. It suggests that btrfs might actually have a similar diff feature, though at a first glance, I can not find it. (In case of ZFS, I'll benchmark it ... interested to see how fast / slow it is.) — s-m-e, Nov 19 '17 at 20:35
@s-m-e The `diff` capability of ZFS isn't listed [on the ZFS Wiki page](https://en.wikipedia.org/wiki/ZFS#Detailed_release_history), so I'm not sure when it was introduced, nor if it's available in OpenZFS. The OpenZFS wiki itself seems pretty out-of-date. It may not be listed because it's always been available. — Andrew Henle, Nov 19 '17 at 20:55

How to determine whether files have been changed in a directory tree without traversing the entire tree?

1 Answers1