Does my C++ code handle 100GB+ file copying?

Question

I need a cross-platform portable function that is able to copy a 100GB+ binary file to a new destination. My first solution was this:

void copy(const string &src, const string &dst)
{
    FILE *f;
    char *buf;
    long len;

    f = fopen(src.c_str(), "rb");
    fseek(f, 0, SEEK_END);
    len = ftell(f);
    rewind(f);

    buf = (char *) malloc((len+1) * sizeof(char));
    fread(buf, len, 1, f);
    fclose(f);

    f = fopen(dst.c_str(), "a");
    fwrite(buf, len, 1, f);
    fclose(f);
}

Unfortunately, the program was very slow. I suspect the buffer had to keep 100GB+ in the memory. I'm tempted to try the new code (taken from Copy a file in a sane, safe and efficient way):

std::ifstream src_(src, std::ios::binary);
std::ofstream dst_ = std::ofstream(dst, std::ios::binary);
dst_ << src_.rdbuf();
src_.close();
dst_.close();

My question is about this line:

dst_ << src_.rdbuf();

What does the C++ standard say about it? Does the code compiled to byte-by-byte transfer or just whole-buffer transfer (like my first example)?

I'm curious does the << compiled to something useful for me? Maybe I don't have to invest my time on something else, and just let the compiler do the job inside the operator? If the operator translates to looping for me, why should I do it myself?

PS: std::filesystem::copy is impossible as the code has to work for C++11.

@TheTechel Why, can you please explain? Also, why do you think my new solution wouldn't work? — ABCD, Sep 22 '18 at 13:39
Any reason why you don’t just call fread() and fwrite() in a loop with an appropriate fixed-size buffer? — Jeremy Friesner, Sep 22 '18 at 13:40
If you don't have std::filesystem::copy, read a few megabytes at a time and write them out again. — Martin Bonner supports Monica, Sep 22 '18 at 13:40
@melpomene That's my question. I'm curious does the << compiled to something useful for me? — ABCD, Sep 22 '18 at 13:40
@JeremyFriesner I don't know. I thought I have to do it whole buffer. — ABCD, Sep 22 '18 at 13:40
I don't think you solution doesn't work, but there already is a native solution available, which also avoids a lot of copying depending on the underlying filesystem. — The Techel, Sep 22 '18 at 13:40
@SmallChess If that is your question, you need to make it much more explicit - it's a much more interesting question like that. — Martin Bonner supports Monica, Sep 22 '18 at 13:41
@SmallChess Highly unlikely. I'm not that familiar with iostreams, but it looks like you're just grabbing the internal read buffer, which at that point is empty (because you haven't read anything yet). — melpomene, Sep 22 '18 at 13:41
@SmallChess Do you think you would able to move 100 pounds of sand with your hands in only one step ? May be moving it one hand at a time ? — Jean-Baptiste Yunès, Sep 22 '18 at 13:44
@Jean-BaptisteYunès If the operator does block transfer for me. Everything would work. — ABCD, Sep 22 '18 at 13:46
@Jean-BaptisteYunès A simple code like the second solution without looping is simple. If it works for large files, it'd be perfect. — ABCD, Sep 22 '18 at 13:46
@melpomene I updated my question. I took my code from a highly voted post. — ABCD, Sep 22 '18 at 13:47
Do you really have a machine with 100GB of RAM?? I don't understand how the first example would work unless you have a server with at least this much ram. — drescherjm, Sep 22 '18 at 13:57
@SmallChess My apologies. Looks like I was wrong. The description on https://en.cppreference.com/w/cpp/io/basic_ostream/operator_ltlt implies `operator<<` on `rdbuf` copies characters in a loop, not all at once. — melpomene, Sep 22 '18 at 13:58
You still probably want to break this up in smaller chunks. Maybe 8GB at a time. — drescherjm, Sep 22 '18 at 13:59
Are your source and destination the same array or a different array on the same server? Is this flash storage or hard disks? I ask because I think different approaches may be taken if you know more about the hardware. — drescherjm, Sep 22 '18 at 14:05
Possible duplicate of [Copy a file in a sane, safe and efficient way](https://stackoverflow.com/questions/10195343/copy-a-file-in-a-sane-safe-and-efficient-way) — jww, Sep 23 '18 at 00:43
@SmallChess It loops inside! You tried to make the copying by yourself, then a loop is necessary as a general way to proceed. Of course if you call a function that do it for you... — Jean-Baptiste Yunès, Sep 23 '18 at 15:10

John Zwinck · Answer 1 · 2018-09-22T14:13:52.987

The crux of your question is what happens when you do this:

dst_ << src_.rdbuf();

Clearly this is two function calls: one to istream::rdbuf(), which simply returns a pointer to a streambuf, followed by one to ostream::operator<<(streambuf*), which is documented as follows:

After constructing and checking the sentry object, checks if sb is a null pointer. If it is, executes setstate(badbit) and exits. Otherwise, extracts characters from the input sequence controlled by sb and inserts them into *this until one of the following conditions are met: [...]

Reading this, the answer to your question is that copying a file in this way will not require buffering the entire file contents in memory--rather it will read a character at a time (perhaps with some chunked buffering, but that's an optimization that shouldn't change our analysis).

Here is one implementation: https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-api-4.6/a01075_source.html (__copy_streambufs). Essentially it a loop calling sgetc() and sputc() repeatedly until EOF is reached. The memory required is small and constant.

score 5 · Accepted Answer · answered Sep 22 '18 at 14:11

The C++ standard (I checked C++98, so this should be extremely compatible) says in [lib.ostream.inserters]:

basic_ostream<charT,traits>& operator<<
    (basic_streambuf<charT,traits> *sb);
Effects: If sb is null calls setstate(badbit) (which may throw ios_base::failure).

Gets characters from sb and inserts them in *this. Characters are read from sb and inserted until any of the following occurs:

end-of-file occurs on the input sequence;

inserting in the output sequence fails (in which case the character to be inserted is not extracted);

an exception occurs while getting a character from sb.

If the function inserts no characters, it calls setstate(failbit) (which may throw ios_base::failure (27.4.4.3)). If an exception was thrown while extracting a character, the function set failbit in error state, and if failbit is on in exceptions() the caught exception is rethrown.

Returns: *this.

This description says << on rdbuf works on a character-by-character basis. In particular, if inserting of a character fails, that exact character remains unread in the input sequence. This implies that an implementation cannot just extract the whole contents into a single huge buffer upfront.

So yes, there's a loop somewhere in the internals of the standard library that does a byte-by-byte (well, charT really) transfer.

However, this does not mean that the whole thing is completely unbuffered. This is simply about what operator<< does internally. Your ostream object will still accumulate data internally until its buffer is full, then call write (or whatever low-level function your OS uses).

Basile Starynkevitch · Answer 3 · 2018-09-22T15:26:43.487

Unfortunately, the program was very slow.

Your first solution is wrong for a very simple reason: it reads the entire source file in memory, then write it entirely.

Files have been invented (perhaps in the 1960s) to handle data that don't fit in memory (and has to be in some "slower" storage, at that time hard disks or drums, or perhaps even tapes). And they have always been copied by "chunks".

The current (Unix-like) definition of file (as a sequence of bytes than is open-ed, read, write-n, close-d) is more recent than 1960s. Probably the late 1970s or early 1980s. And it comes with the notion of streams (which has been standardized in C with <stdio.h> and in C++ with std::fstream).

So your program has to work (like every file copying program today) for files much bigger than the available memory.You need some loop to read some buffer, write it, and repeat.

The size of the buffer is very important. If it is too small, you'll make too many IO operations (e.g. system calls). If it is too big, IO might be inefficient or even not work.

In practice, the buffer should today be much less than your RAM, typically several megabytes.

Your code is more C like than C++ like because it uses fopen. Here is a possible solution in C with <stdio.h>. If you code in genuine C++, adapt it to <fstream>:

void copyfile(const char*destpath, const char*srcpath) {
 // experiment with various buffer size
#define MYBUFFERSIZE (4*1024*1024) /* four megabytes */
  char* buf = malloc(MYBUFFERSIZE);
  if (!buf) { perror("malloc buf"); exit(EXIT_FAILURE); };
  FILE* filsrc = fopen(srcpath, "r");
  if (!filsrc) { perror(srcpath); exit(EXIT_FAILURE); };
  FILE* fildest = fopen(destpath, "w");
  if (!fildest) { perror(destpath); exit(EXIT_FAILURE); };
  for (;;) {
     size_t rdsiz = fread(buf, 1, MYBUFFERSIZE, filsrc);
     if (rdsiz==0) // end of file
        break;
     else if (rdsiz<0) // input error
        { perror("fread"); exit(EXIT_FAILURE); };
     size_t wrsiz = fwrite(buf, rdsiz, 1, fildest);
     if (wrsiz != 1) { perror("fwrite"); exit(EXIT_FAILURE); };
   }
   if (fclose(filsrc)) { perror("fclose source"); exit(EXIT_FAILURE); };
   if (fclose(fildest)) { perror("fclose dest"); exit(EXIT_FAILURE); };
 }

For simplicity, I am reading the buffer in byte components and writing it as a whole. A better solution is to handle partial writes.

Apparently dst_ << src_.rdbuf(); might do some loop internally (I have to admit I never used it and did not understand that at first; thanks to Melpopene for correcting me). But the actual buffer size matters a big lot. The two other answers (by John Swinck and by melpomene) focus on that rdbuf() thing. My answer focus on explaining why copying can be slow when you do it like in your first solution, and why you need to loop and why the buffer size matters a big lot.

If you really care about performance, you need to understand implementation details and operating system specific things. So read Operating systems: three easy pieces. Then understand how, on your particular operating system, the various buffering is done (there are several layers of buffers involved: your program buffers, the standard stream buffers, the kernel buffers, the page cache). Don't expect your C++ standard library to buffer in an optimal fashion.

Don't even dream of coding in standard C++ (without operating system specific stuff) an optimal or very fast copying function. If performance matters, you need to dive in OS specific details.

On Linux, you might use time(1), oprofile(1), perf(1) to measure your program's performance. You could use strace(1) to understand the various system calls involved (see syscalls(2) for a list). You might even code (in a Linux specific way) using directly the open(2), read(2), write(2), close(2) and perhaps readahead(2), mmap(2), posix_fadvise(2), madvise(2), sendfile(2) system calls.

At last, large file copying are limited by disk IO (which is the bottleneck). So even by spending days in optimizing OS specific code, you won't win much. The hardware is the limitation. You probably should code what is the most readable code for you (it might be that dst_ << src_.rdbuf(); thing which is looping) or use some library providing file copy. You might win a tiny amount of performance by tuning the various buffer sizes.

If the operator translates to looping for me, why should I do it myself?

Because you have no explicit guarantee on the actual buffering done (at various levels). As I explained, buffering matters for performance. Perhaps the actual performance is not that critical for you, and the ordinary settings of your system and standard library (and their default buffers sizes) might be enough.

PS. Your question contains at least 3 different questions (but related ones). I don't find it clear (so downvoted it), because I did not understand what is the most relevant one. Is it : performance? robustness? meaning of dst_ << src_.rdbuf();? Why is the first solution slow? How to copy large files quickly?

Which solution are you talking about? The first one (C-style, `malloc`) or the second one (`dst_ << src_.rdbuf();`)? — melpomene, Sep 22 '18 at 13:59
You're explaining to him why it's wrong and why files were invented but you didn't tell him what to do instead. It's more like a comment than the answer. — Konrad, Sep 22 '18 at 14:01
You've missed the whole point of the question (*what does `dst_ << src_.rdbuf()` do?*). "*Your code is more C like than C++ like because it uses `fopen`*"? No, that's just part of the background explanation. The actual question is about `operator<<`. — melpomene, Sep 22 '18 at 14:16
"*Since it gives some internal buffer, you also need to loop around that*" is wrong. — melpomene, Sep 22 '18 at 14:26
@melpomene: How that could be wrong with a 100Gb file: the `rdbuf()` will *never* in practice be that big! So a loop is still needed. And imagine a 32 bits system: the `rdbuf()` fits in virtual address space, so is less than 4Gbytes (and often 3Gbytes). And 100Gbytes don't fit inside 4Gbytes without some kind of loop — Basile Starynkevitch, Sep 22 '18 at 14:37
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/180577/discussion-between-melpomene-and-basile-starynkevitch). — melpomene, Sep 22 '18 at 14:41
It wasn't asked, it was simply stated that the program was slow. The actual question followed in the second half. — rustyx, Sep 22 '18 at 18:30

Does my C++ code handle 100GB+ file copying?

3 Answers3