The proper logic for copying a file in POSIX.1 systems, including Linux, is roughly
Open source file
Open target file
Repeat:
Read a chunk of data from source
Write that chunk to target
Until no more data to read
Close source file
Close target file
Proper error handling adds a significant amount of code, but I consider it a necessity, not an optional thing to be added afterwards if one has the time to do so.
(I am so strict in this, that I'd fail anyone who omits error checking, even if their program otherwise functioned properly. The reason is basic sanity: A tool that may blow up in your hands is not a tool, it is a bomb. There are enough bombs in the software world already, and we don't need more "programmers" who create those. What we need are reliable tools.)
Here is an example implementation with proper error checking:
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#define DEFAULT_CHUNK 262144 /* 256k */
int copy_file(const char *target, const char *source, const size_t chunk)
{
const size_t size = (chunk > 0) ? chunk : DEFAULT_CHUNK;
char *data, *ptr, *end;
ssize_t bytes;
int ifd, ofd, err;
/* NULL and empty file names are invalid. */
if (!target || !*target || !source || !*source)
return EINVAL;
ifd = open(source, O_RDONLY);
if (ifd == -1)
return errno;
/* Create output file; fail if it exists (O_EXCL): */
ofd = open(target, O_WRONLY | O_CREAT | O_EXCL, 0666);
if (ofd == -1) {
err = errno;
close(ifd);
return err;
}
/* Allocate temporary data buffer. */
data = malloc(size);
if (!data) {
close(ifd);
close(ofd);
/* Remove output file. */
unlink(target);
return ENOMEM;
}
/* Copy loop. */
while (1) {
/* Read a new chunk. */
bytes = read(ifd, data, size);
if (bytes < 0) {
if (bytes == -1)
err = errno;
else
err = EIO;
free(data);
close(ifd);
close(ofd);
unlink(target);
return err;
} else
if (bytes == 0)
break;
/* Write that same chunk. */
ptr = data;
end = data + bytes;
while (ptr < end) {
bytes = write(ofd, ptr, (size_t)(end - ptr));
if (bytes <= 0) {
if (bytes == -1)
err = errno;
else
err = EIO;
free(data);
close(ifd);
close(ofd);
unlink(target);
return err;
} else
ptr += bytes;
}
}
free(data);
err = 0;
if (close(ifd))
err = EIO;
if (close(ofd))
err = EIO;
if (err) {
unlink(target);
return err;
}
return 0;
}
The function takes the target file name (to be created), source file name (to be read from), and optionally the preferred chunk size. If you supply 0, the default chunk size is used. On current Linux hardware, 256k chunk size should reach maximum throughput; smaller chunk size may lead to slower copy operation on some (big and fast) systems.
The chunk size should be a power of two, or a small multiple of a large power of two. Because the chunk size is chosen by the caller, it is dynamically allocated using malloc()
/free()
. Note that it is explicitly freed in error cases.
Because the target file is always created -- the function will fail, returning EEXIST
if the target file already exists --, it is removed ("unlinked") if an error occurs, so that no partial file is left over in error cases. (It is a common bug to forget to free dynamically allocated data in the error path; this is often called "leaking memory".)
The exact usage for open()
, read()
, write()
, close()
, and unlink()
can be found at the Linux man pages.
write()
returns the number of bytes written, or -1 if an error occurs. (Note that I explicitly treat 0 and all negative values smaller than -1 as I/O errors, because they should not normally occur.)
read()
returns the number of bytes read, -1 if an error occurs, or 0 if there is no more data.
Both read()
and write()
may return a short count; i.e., less than was requested. (In Linux, this does not happen for normal files on most local filesystems, but only an idiot relies on the above function to be used on such files. Handling short counts isn't that complex, as you can see from the above code.)
If you wanted to add a progress meter, for example using a callback function, for example
void progress(const char *target, const char *source,
const off_t completed, const off_t total);
then it would make sense to add an fstat(ifd, &info)
call before the loop (with struct stat info;
and off_t copied;
, the latter counting the number of bytes copied). That call too may fail or report info.st_size == 0
, if the source is e.g. a named pipe instead of a normal file. This means that the total
parameter might be zero, in which case the progress meter would display only the progress in bytes (completed
), with the remaining amount unknown.