1

I need to take a backup of a file while it is being written to. A running program periodically appends data to a file, and another application copies contents to a backup file.

My approach: The backup program wakes up every x seconds and issues a pread request for y bytes from the previous offset in the original file. If the pread call returns a positive integer indicating the number of bytes retrieved, I write them to the backup file.

Can this approach lead to an inconsistent backup file? Is it possible that the pread call reads a chunk of data that was not fully written in the original file? Note that data is only appended to the original file. Initial tests show this approach works fine, but it could be incidental.

Writer code:

fd = open_file();
while(!done) {
    do_some_work();
    write(fd, buf, bufsize);
}

Reader code:

fd_in  = open_original_file();
fd_out = open_backup_file();

while(!done) {
    // Issue a read call
    bytes_in = pread(fd_in, buf, chunksize, current_offset);

    // Data retrieved
    if(bytes_in > 0) {
        pwrite(fd_out, buf, bytes_in, current_offset);
        current_offset += bytes_in;
    }
    sleep(5);
}
Korizon
  • 3,677
  • 7
  • 37
  • 52
  • regardinng: `fd = open_file(); while(!done) { do_some_work(); write(fd, buf, bufsize); }` this never sets the `done` variable. – user3629249 May 05 '21 at 20:10
  • the call to `write()` places the data into an output stream buffer. Only when that buffer overflows will the data actually be written to the file. Suggest using `fflush()` after each call to `write()` so the data is immediately passed to the file. – user3629249 May 05 '21 at 20:13

1 Answers1

1

Yes, it should be safe. POSIX I/O guarantees sequential consistency; that is, concurrent accesses to the same file will complete as if they were executed in some sequential order. Writes are atomic; a read can only read data that was written completely.

Some network file systems weaken the sequential consistency requirement, but I doubt that they'd violate the atomicity of writes.

dabo42
  • 486
  • 3
  • 6
  • 1
    From https://stackoverflow.com/questions/35595685/write2-read2-atomicity-between-processes-in-linux, "POSIX doesn't give any minimum guarantee of atomic operations for read and write except for writes on a pipe (where a write of up to PIPE_BUF (≥ 512) bytes is guaranteed to be atomic, but reads have no atomicity guarantee)" – Korizon May 04 '21 at 14:23
  • The filesystem in my case is xfs, as that seems to impact the atomicity of writes and reads. – Korizon May 04 '21 at 14:28