1

I want to copy huge files (~ 10 GB) in my C/C++ program and two options are on hand: 1) Write my own copy function (large buffer might be used), and 2) Call to system copy command (copy on Windows, cp on Linux).

As I see, when use "Ctrl + C" and "Ctrl + V" on Windows, huge files are copied very fast. I am not sure if we can do better than Windows OS.

Which would be the best choice?

duong_dajgja
  • 4,196
  • 1
  • 38
  • 65
  • 1
    In Windows, why not use one of the commands designed to deal more efficiently with huge files and file sets? Like `xcopy`, `robocopy` and there's at least one more, I forget the name. – Cheers and hth. - Alf Aug 20 '15 at 01:53
  • I'd say the best choice would be to use `boost::filesystem::copy` while we wait for implementations to ship with `std::experimental::filesystem::copy`. – user657267 Aug 20 '15 at 02:11
  • I think this is Stack Overflow question and answer provides some different ways of doing such a copy (and some relative timings) http://stackoverflow.com/questions/10195343/copy-a-file-in-a-sane-safe-and-efficient-way – Michael Petch Aug 20 '15 at 02:36
  • @MichaelPetch: I did read the question before posting my question. What I was wondering is that why don't we use copy shell command instead of writing a new one? For example, on Windows, robocopy needs ~1.5 mins to copy a file of 5 GB (~ 55 MB per sec.) (Windows 7, core i5, 8 GB). I think this is already really fast. Can we make an our own faster one? – duong_dajgja Aug 20 '15 at 02:42
  • Well portability is one. If you write C++(or C code) to do the copy it shouldn't matter what OS you use it on - it should run the same way on all With your method you need to know the underlying copy mechanism as you mentioned like `ls`, `cp` etc – Michael Petch Aug 20 '15 at 02:45
  • I meant `copy` (not `ls`). My final comment would be that if you go the more portable C++ route you might find using lower level FILE access rather than the stream C++ mechanism better. – Michael Petch Aug 20 '15 at 02:58

3 Answers3

2

With proper implementation, roll your own code™ give you flexibility over the shell copy. For example, it's easier to abort operation, and provide progress to user.

By the way, when you see Windows copying file fast - it's just perspective. File explorer queue the copy or otherwise do it in background. It takes about the same time with, e.g. CopyFileEx or sendfile until the copy finish and the file is usable.

Non-maskable Interrupt
  • 3,841
  • 1
  • 19
  • 26
2

The reasons for not using shell to use basic tasks in the language are not related to performance at all - it's more about safety and portability. You wouldn't use eval in your language nor try to concatenate strings to create SQL queries - and yet there you are, creating shell commands by concatenating strings.

Brett Hale's "solution" conveniently doesn't mention all these and hides the code needed to make this safe and portable behind the comment "do it yourself" - in fact, when you do this you'll end up with more code than hand-rolled copy function and it'll still be buggy. And if you have a bug there, an user can inject commands (for example, run it with a destination file a_file" || rm -rf --no-preserve-root). Also you're relying on the shell, which itself can have bugs (see Shellshock)

Calvin's answer correctly mentions why the copy operation done by the shell may work faster - it can do more tricks to make it look like it copies faster. In fact, there is no inherent magic in the shell copy operations. The "performance problem" is not a problem since the main bottleneck is in actual reading and writing.

Furthermore, you present a false dichotomy, since you fail to consider using a third option: a third-party library. One of them is Boost.Filesystem, which has a copy function.

milleniumbug
  • 15,379
  • 3
  • 47
  • 71
0

If it were me I would avoid making system calls and do something like so:

    int main()
    {
         std::ifstream  src("from.ogv", std::ios::binary);
         std::ofstream  dst("to.ogv",   std::ios::binary);
         dst << src.rdbuf();
    }
Jack Ryan
  • 1,287
  • 12
  • 26
  • Is there any 'critical' reason for avoiding system calls? I made a program that reads a buffer with size of 8192 then writes to the output file, but it's much slower compared with "Ctrl + C" and "Ctrl + V" in Windows. – duong_dajgja Aug 20 '15 at 01:54
  • That's not avoiding "system calls", that's avoiding the `system` call which is different. Also this lacks rationale, which was the point of this question. – milleniumbug Aug 20 '15 at 09:05