2

What is the best way to create an "atomic" snapshot of file contents in Linux? Emphasis is not on performance, but on getting contents as a whole.

I may think of using sendfile(2) (since 2.6.33) or splice(2), but neither have any indication of operation atomicity. Both are run in the kernel-space entirely, but at least sendfile(2) implies it's using mmap(2) and mmap gives no guarantees that writes to the same mmaped (as MAP_SHARED) region in other processes won't be visible even with MAP_PRIVATE (probably they will, because that are the same pages).

Taking that this functions are writing with performance in mind and sendfile(2) is optimized to be used with DMA, I may only assume that they just copy memory in some background kernel thread and it's quite possible that other operations may also affect the data being copied.

So the only possible solution I see is to place a read lease with fcntl(2) (FD_SETLEASE) and copy file as normal, but if someone opens it for writing, either try to "rush" it (very reliable, I know) and beat the timer, or just give up and try later. Is that correct?

Community
  • 1
  • 1
Andrian Nord
  • 677
  • 4
  • 11
  • You may want to consider [btrfs](https://btrfs.wiki.kernel.org/index.php/Main_Page) which is capable of doing **subvolume** snapshots through a copy-on-write mechanism. Unfortunately, I don't think they support single-file snapshots. – Ze Blob Sep 11 '15 at 23:53
  • It's not exactly a programming solution – Andrian Nord Sep 11 '15 at 23:55
  • Am I correct in understanding you are trying to get this "atomic" snapshot without any synchronization between other processes that may also be dealing with the file, or do you effectively have control over all file access between processes? – dho Sep 12 '15 at 00:32
  • No, I don't have control upon all the involved processes. Processes should be considered non-cooperative. – Andrian Nord Sep 12 '15 at 00:34
  • Can you clarify what a "programming solution" would be? No calls to `system()`? Can the writer processes be affected in any way, e.g. by the copy process running in real time or using mandatory file locking? Can you use a filesystem type that supports snapshots, such as btrfs, zfs, or lvm, or must the solution work on any filesystem type? – Mark Plotnick Sep 12 '15 at 01:03
  • Depending on how large the file is you are talking about, why would you not want to simply lock the file, copy the current contents to memory in a single operation. Take a look at [**How to lock files using fopen?**](http://stackoverflow.com/questions/7573282/how-to-lock-files-using-fopen) – David C. Rankin Sep 12 '15 at 01:03
  • @MarkPlotnick: programming solution is a solution that does not require a special filesystem configuration. Especially the one that does not involving creating a whole filesystem snapshot just for one file. That's kinda obvious. – Andrian Nord Sep 12 '15 at 01:07
  • @DavidC.Rankin 1. Linux has no mandatory locking (almost). In other words - if you are not calling flock you may just safely ignore it's existence. That what is called "non-cooperative" process - one that does not know, that he should make a flock call. Almost no Linux applications are using locks prior doing something. 2. Even if there would have been mandatory locking I would not recommend using it, as it may break other applications, throw cryptic messages in face of a user and bad stuff like that. That's a Windows way. – Andrian Nord Sep 12 '15 at 01:12
  • Sorry. When you said the emphasis was not on performance, I thought the common practice used by backup software - create a temporary snapshot, copy the chosen files to another medium, destroy the snapshot - might be an option. If you can use btrfs, there's an even more efficient way - just do `ioctl (dest_fd, BTRFS_IOC_CLONE, src_fd);` – Mark Plotnick Sep 12 '15 at 01:38
  • @MarkPlotnick that's what I call "non-programming" solution. You think about a tool. I think about a function. Performance is not an issue, but to a reasonable extent. Also, user resources are not unlimited either. Plus, btrfs is still experimental and not very common. Plus, if there is no btrfs you are screwed, because almost no other ordinary fs supports snapshots. Plus interface differs even between FS that do support this. In other words - there are too many things outside of c-library and kernel APIs to care about and only feasible if you are doing a very specific task. – Andrian Nord Sep 12 '15 at 01:51

2 Answers2

1

So the only possible [filesystem-independent] solution I see is to place a read lease with fcntl(2) (FD_SETLEASE) and copy file as normal, but if someone opens it for writing, either try to "rush" it (very reliable, I know) and beat the timer, or just give up and try later. Is that correct?

Almost; there is also fanotify. Plus, as mentioned in a comment, there are some filesystem-specific options, and some possibilities only available in certain configurations.

The lease break timer is configurable, /proc/sys/fs/lease_break_time in seconds, and the default is 45 seconds.

"Just give up and try later" is also a bit defeatist; you do have ways to monitor when the snapshot might work. Consider placing an inotify IN_CLOSE_WRITE and IN_CLOSE_NOWRITE watch on the file, and try the snapshot whenever you receive such an event.

fanotify:

For a few years now, I've been monitoring the progress of Linux fanotify, in the hopes that it would grow enough features that it could be used for automagic file versioning. Essentially, whenever someone opens the file with write permissions, the current file would be snapshot to temporary storage, marked with some metadata (timestamp, real human user (backtracked through sudo/su), and so on). When that descriptor is closed, another snapshot is taken, and a helper thread/process diffs the two, annotating the changes (or even pushing it to git).

It is limited to local filesystems, but with 2.6.37 and later kernels (including 3.x), the interface is sufficient for specific files, or an entire mount. In your case, the fanotify interface allows similar features to file leases, except for local filesystems only, but you can simply deny any accesses during the snapshot. (One can argue whether that is a good idea at all, especially if the file to be snapshotted is a system or configuration file; many programmers overlook error checking, because "some files just have to be always accessible, or your system is broken".)

As far as my change monitoring goes, fanotify should now have all sufficient features, but only if an entire mount is monitored. I was hoping to monitor configuration files on multi-admin clusters, but those files reside on the same mount as all system libraries and binaries do, so the monitoring causes considerable overhead. So much so, that it seems more appropriate to just modify SSH configuration, console configuration (getty etc.), sudo configuration, and possibly su, to always include a dynamic library that interposes file access syscalls, and basically does the versioning on behalf of the user. This way service binaries are not affected, only user actions are monitored.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86
-1

This might work under some circumstances:

  1. (Optional) Do something to prevent new processes to open the file:

    a/ rename the file

    b/ restrict file permissions

  2. Find all existing file readers/writers via lsof and kill -STOP them

  3. Do your snapshot

  4. kill -CONT all readers/writers

  5. (Optional) Restore action 1.

vlp
  • 7,811
  • 2
  • 23
  • 51
  • As I am here to learn. Could the downvoter(s) explain me the problem with this answer? – vlp Sep 15 '15 at 22:29