1

Once upon a time long ago, we had a bash script that works out a list of files that need to be copied based on some criteria (basically like a filtered version of cp -rf). This was too slow and was replaced by a C++ program.

What the C++ program does is essentially:

foreach file
   read entire file into buffer
   write entire file

The program uses Posix calls open(), read() and write() to avoid buffering and other overheads vs iostream and fopen, fread & fwrite.

Is it possible to improve on this?

Notes:

  • I am assuming these are not sparse files
  • I am assuming GNU/Linux
  • I am not assuming a particular filesystem is available
  • I am not assuming prior knowledge of whether the source and destination are on the same disk.
  • I am not assuming prior knowledge of the kind of disk, SSD, HDD maybe even NFS or sshfs.
  • We can assume the source files are on the same disk as each other.
  • We can assume the destination files will also be on the same disk as each other.
  • We cannot assume whether the source and destinations are on the same disk or or not.

I think the answer is yes but it is quite nuanced.

Copying speed is of course limited by disk IO not CPU.

But how can we be sure to optimise our use of disk IO?

Maybe the disk has the equivalent of multiple read or write heads available? (perhaps an SSD?) In which case performing multiple copies in parallel will help.

Can we determine and exploit this somehow?


This is surely well trod territory so rather than re-invent the wheel straight away (though that is always fun) it would be nice to hear what others have tried or would recommend. Otherwise I will try various things and answer my own question sometime in the distant future.

This is what my evolving answer looks like so far...

If the source and destination are different physical disks then we can at least read and write at the same time with something like:

writer thread
  read from write queue
  write file

reader thread
   foreach file
   read file
   queue write on writer thread

If the source and destination are on the same physical disk and we happen to be on a filesystem with copy on write semantics (like xfs or btrfs) we can potentially avoid actually copying the file at all. This is apparently called "reflinking". The cp command supports this using --reflink=auto.

See also:

From this question

and https://github.com/coreutils/coreutils/blob/master/src/copy.c

it looks as if this is done using an ioctl as in:

ioctl (dest_fd, FICLONE, src_fd);

So a quick win is probably:

try FICLONE on first file.
If it succeeds then:
   foreach file
      srcFD = open(src);
      destFD = open(dest);
      ioctl(destFD,FICLONE,srcFD);
else
   do it the other way - perhaps in parallel

In terms of low-level system APIs we have:

  • copy_file_range
  • ioctl FICLONE
  • sendfile

I am not clear when to choose one over the other except that copy_file_range is not safe to use with some filesystems notably procfs.

This answer gives some advice and suggests sendfile() is intended for sockets but in fact this is only true for kernels before 2.6.33.

https://www.reddit.com/r/kernel/comments/4b5czd/what_is_the_difference_between_splice_sendfile/

copy_file_range() is useful for copying one file to another (within the same filesystem) without actually copying anything until either file is modified (copy-on-write or COW).

splice() only works if one of the file descriptors refer to a pipe. So you can use for e.g. socket-to-pipe or pipe-to-file without copying the data into userspace. But you can't do file-to-file copies with it.

sendfile() only works if the source file descriptor refers to something that can be mmap()ed (i.e. mostly normal files) and before 2.6.33 the destination must be a socket.


There is also a suggestion in a comment that reading multiple files then writing multiple files will result in better performance. This could use some explanation. My guess is that it tries to exploit the heuristic that the source files and destination files will be close together on the disk. I think the parallel reader and writer thread version could perhaps do the same. The problem with such a design is it cannot exploit any performance gain from the low level system copy APIs.

Bruce Adams
  • 4,953
  • 4
  • 48
  • 111
  • Faster to read groups and partials of files up to N (say a few meg-bytes) and then write them. Read with `fread()` or low level routines. – chux - Reinstate Monica May 02 '22 at 02:12
  • Look into https://man7.org/linux/man-pages/man2/copy_file_range.2.html – Shawn May 02 '22 at 05:40
  • @chux-ReinstateMonica why? Is it based on the heuristic that the existing files are likely to be closer together or something else. – Bruce Adams May 02 '22 at 08:45
  • @shawn good tip – Bruce Adams May 02 '22 at 08:48
  • I can find many more questions about file copying here if I search for copy_file_range() which did not turn up when I wrote the question. I will check for duplication. – Bruce Adams May 02 '22 at 09:11
  • 1
    @Shawn: Note that on modern Linux, `sendfile` can do the same job. `copy_file_range` is Linux-specific, and `sendfile` being able to send to files for output is Linux-specific, but at least the latter exists elsewhere (and *may* support sending to output files). – ShadowRanger May 02 '22 at 15:41
  • @ShadowRanger - CFR is optimized for file copies with hooks into the filesystem code that I'm pretty sure sendfile doesn't have. Sendfile is also very much Linux specific in every way - if a syscall by that name exists on other OSes, it's with different arguments and semantics. – Shawn May 02 '22 at 17:10
  • @Shawn: Ah, you're right. I kind of assume Linux `sendfile` probably uses the same hooks when outputting to a file as `copy_file_range` (because why wouldn't it?), but yeah, while something called `sendfile` exists everywhere but Windows, the prototype on macOS/BSD is quite different, so the odds of it providing any portability is basically nil. This is what I get for mostly using it through Python (which hides some of the OS discrepancies to present a common core interface). :-) – ShadowRanger May 02 '22 at 18:45
  • You can beat it, but `tar | tar` is really good and really cheap. – Joshua May 03 '22 at 01:04

2 Answers2

1

The general answer is: Measure before trying another strategy.

For HDD this is probably your answer: https://unix.stackexchange.com/questions/124527/speed-up-copying-1000000-small-files

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • Sorting by inode number is a good trick for local drives. It probably wouldn't help but do you know if NFS exports the source inode number or it is just made up (by the client?) based on directory order? – Bruce Adams May 11 '22 at 10:13
  • @BruceAdams Did you test? I do not know either, but I was surprised by the linked results. – Ole Tange May 11 '22 at 10:37
  • Not yet. I am still writing the code. Some aspects of testing will be `interesting` as I would ideally like to have automated tests pretending to have different kinds of filesystem. – Bruce Adams May 11 '22 at 11:38
1

Ultimately I did not determine the "most efficient" way but I did end up with a solution that was sufficiently fast for my needs.

  1. generate a list of files to copy and store it

  2. copy files in parallel using openMP

    #pragma omp parallel for
    for (auto iter = filesToCopy.begin(); iter < filesToCopy.end(); ++iter)
    {
       copyFile(*iter);
    }
    
  3. copy each file using copy_file_range()

  4. falling back to using splice() with a pipe() when compiling for old platforms not supporting copy_file_range().

Reflinking, as supported by copy_file_range(), to avoid copying at all when the source and destination are on the same filesystem is a massive win.

Dominique
  • 16,450
  • 15
  • 56
  • 112
Bruce Adams
  • 4,953
  • 4
  • 48
  • 111