I've been working on a fairly large C++ project for a few weeks now. My original goal was to use this project to learn about C++11 and use only pure C++ code and avoid manual allocation and C constructs. However, I think this problem is going to force me to use C for a small function and I'd like to know why.
Basically I have a save function that will copy a somewhat large binary file to a separate location before I make changes to the data in it. The files themselves are CD images with a max size of around 700MB. Here is the original C++ code that I used:
std::ios::sync_with_stdio(false);
std::ifstream in(infile, std::ios::binary);
std::ofstream out(outfile, std::ios::binary);
std::copy(std::istreambuf_iterator<char>(in), std::istreambuf_iterator<char>(), std::ostreambuf_iterator<char>(out));
out.close();
in.close();
This code when used with a 690MB file takes barely under 4 minutes to complete. I have ran it with multiple files and it's always the same result; nothing under 3 minutes. However, I also found the following way which ran a little bit faster, but still nowhere as fast as C:
std::ios::sync_with_stdio(false);
std::ifstream in(infile, std::ios::binary);
std::ofstream out(outfile, std::ios::binary);
out << in.rdbuf();
out.close();
in.close();
This one took 24 seconds, but it's still around 20 times slower than C.
After looking around I found someone needing to write an 80GB file and seeing that he could write at full speed using C. I decided to give it a try with this code:
FILE *in = fopen(infile, "rb");
FILE *out = fopen(outfile, "wb");
char buf[1024];
int read = 0;
// Read data in 1kb chunks and write to output file
while ((read = fread(buf, 1, 1024, in)) == 1024)
{
fwrite(buf, 1, 1024, out);
}
// If there is any data left over write it out
fwrite(buf, 1, read, out);
fclose(out);
fclose(in);
The results were pretty shocking. Here is one of the benchmarks I have after running it multiple times on many different files:
File Size: 565,371,408 bytes
C : 1.539s | 350.345 MB/s
C++: 24.754s | 21.7815 MB/s - out << in.rdbuf()
C++: 220.555s | 2.44465 MB/s - std::copy()
What is the cause of this vast difference? I know C++ won't match the performance of plain C, but 348MB/s difference is massive. Is there something I'm missing?
Edit:
I am compiling this using Visual Studio 2013 on a Windows 8.1 64-bit OS.
Edit 2:
After reading John Zwinck's answer I decided to just go the platform specific route. Since I still wanted to make my project cross-platform I threw together a quick example. I am really not sure if these work on the other systems besides Windows, but I can test Linux at a later date. I cannot test OSX, but I think copyfile looks like a simple function so I assume it's correct.
Keep in mind you need to do the same #ifdef logic for including platform specific headers.
void copy(std::string infile, std::string outfile)
{
#ifdef _WIN32 || _WIN64
// Windows
CopyFileA(infile.c_str(), outfile.c_str(), false);
#elif __APPLE__
// OSX
copyfile(infile.c_str(), outfile.c_str(), NULL, COPYFILE_DATA);
#elif __linux
// Linux
struct stat stat_buf;
int in_fd, out_fd;
offset_t offset = 0;
in_fd = open(infile.c_str(), O_RDONLY);
fstat(in_fd, &stat_buf);
out_fd = open(outfile.c_str(), O_WRONLY | O_CREAT, stat_buf.st_mode);
sendfile(out_fd, in_fd, &offset, stat_buf.st_size);
close(out_fd);
close(in_fd);
#endif
}