17

I wanted to test the performance of writing to a file in a bash script vs a C++ program.

Here is the bash script:

#!/bin/bash

while true; do
        echo "something" >> bash.txt
done

This added about 2-3 KB to the text file per second.

Here is the C++ code:

#include <iostream>
#include <fstream>

using namespace std;

int main() {
    ofstream myfile;
    myfile.open("cpp.txt");

    while (true) {
        myfile << "Writing this to a file Writing this to a file \n";
    }

    myfile.close();
}

This created a ~6 GB text file in less than 10 seconds.

What makes this C++ code so much faster, and/or this bash script so much slower?

obl
  • 1,799
  • 12
  • 38
  • 16
    Just guessing here but I'd say the main difference is that batch opens and closes the file each iteration while C++ doesn't. Try moving open() and close() inside the loop in C++ to have a fair performance comparison (you'll need to pass ios::app to open) – IlBeldus Jul 04 '17 at 18:34
  • 4
    Or, put the redirection on the loop in the shell script: `while true; do ...; done >> bash.txt`. – chepner Jul 04 '17 at 18:35
  • @IlBeldus Note that this is bash, not batch. – Code-Apprentice Jul 04 '17 at 18:40
  • 12
    Confirmed using `strace` that my `bash` opens and closes the `bash.txt` file every time. – aschepler Jul 04 '17 at 18:40
  • 1
    @obl It is related to your question in that it is a comment on the overabundance of unnecessary code in it. Unless you get paid by lines of code, you could take it as useful information, knowledge that may help you write more concise code in the future. – juanchopanza Jul 04 '17 at 18:42
  • 3
    See how a stupid little program like this compares: `#include int main() { while (true) { std::ofstream myfile("cpp.txt", std::ios::app); myfile << "Writing this to a file Writing this to a file \n"; } }` – user4581301 Jul 04 '17 at 18:51
  • the performance of that code is very close to the bash script @user4581301 – obl Jul 04 '17 at 19:42
  • Its perfectly possible to write that bash code and only open the file once, just do the redirection after the `done >> bash.txt` instead of after the `echo`. That will remove the open/close overhead but will probably still be slower. – cdarke Jul 04 '17 at 20:34
  • 1
    _"Here is the C++ code:"_ - No fair! The string you're writing in C++ is almost 5 times as long as the one in your bash script! No wonder it's faster. Try using the longer string in bash and check if it's faster ;) (also try the shorter string in both and see if that makes a difference!) – marcelm Jul 04 '17 at 22:56
  • 1
    Also, _"This added about 2-3 KB to the text file per second."_ - Really? What hardware and OS? On my computer (i5, Debian 8) I see over 1MB per second (and about double that in `zsh`). That's quite a big speed difference. – marcelm Jul 04 '17 at 22:57
  • @juanchopanza Why would you want to open and close the file every time you write? – ManicQin Jul 05 '17 at 07:56
  • Very relevant answer: https://unix.stackexchange.com/questions/257297/how-does-yes-write-to-file-so-quickly/257393#257393 – orion Jul 05 '17 at 08:55
  • @ManicQin You wouldn't, that would make no sense. – juanchopanza Jul 05 '17 at 09:11
  • @juanchopanza Ignore, I merged your answer and user4581301 into one. – ManicQin Jul 05 '17 at 09:32
  • @ManicQin Ah, OK. So I guess the answer would be "to investigate the effect of opening and closing the file for each line in C++, given that it is what the bash script does". – juanchopanza Jul 05 '17 at 10:05
  • Since my first comment got removed, here it goes again (since comments are also to suggest improvements.) You have too much code in your C++ code. You can achieve exactly the came with `ofstream myfile("cpp.txt");`, omitting the calls to `open()` and `close()`. – juanchopanza Jul 06 '17 at 06:31

3 Answers3

39

There are several reasons to it.

First off, interpreted execution environments (like bash, perl alongside with non-JITed lua and python etc.) are generally much slower than even poorly written compiled programs (C, C++, etc.).

Secondly, note how fragmented your bash code is - it just writes a line to a file, then it writes one more, and so on. Your C++ program, on the other side, performs buffered write - even without your direct efforts to it. You might see how slower will it run if you substitute

myfile << "Writing this to a file Writing this to a file \n";

with

myfile << "Writing this to a file Writing this to a file" << endl;

for more information about how streams are implemented in C++, and why \n is different from endl, see any reference documentation on C++.

Thirdly, as comments prove, your bash script performs open/close of the target file for each line. This implies a significant performance overhead in itself - imagine myfile.open and myfile.close moved inside your loop body!

iehrlich
  • 3,572
  • 4
  • 34
  • 43
  • 3
    Flushing the performance down the drain is a great start. The next step is to open the file for append and close it on every loop. Should get even closer. – user4581301 Jul 04 '17 at 18:42
  • @user4581301 yeah, I though about it (see edit), but was not quite sure - not an expert in bash :) – iehrlich Jul 04 '17 at 18:43
  • 2
    IIRC, `bash` lines must be translated/"built" to native every time. This is not true of `perl`, which is compiled only once, or `python`, which is compiled to byte code. `Bash` won't build a line until it's about to run it, while `perl` buils everything at the beginning, etc. – code_dredd Jul 04 '17 at 19:05
  • Running it with 'endl' instead of '\n' did make it significantly slower but still faster than the bash script. Running the code posted by @user4581301, the performance was very similar to the performance of the bash script. – obl Jul 04 '17 at 19:45
  • 1
    _"... interpreted execution environments (like ... python ..."_ - Is it though? CPython, the default Python implementation, compiles the Python source to bytecode, which is run in a VM (which some call the interpreter, and that makes things even more confusing). I'm not intimately familiar with Perl, but I wouldn't be suprised if it employed a similar construction. I think purely interpreted language implementations are quite rare nowadays. Though I'm pretty sure Unix shells still are. – marcelm Jul 04 '17 at 22:49
  • Yes, it is. In modern world, *translation* to bytecode and *interpretation* of this bytecode counts as an interpretation, not as compilation :) I do agree that this notation is very confusing, but that's how it evolved throughout the years. – iehrlich Jul 04 '17 at 22:53
  • [difference between `std::endl` and `\n`](https://stackoverflow.com/q/213907/995714) – phuclv Jul 05 '17 at 05:07
6

As others have already pointed out, this is because you are currently opening and closing the file with each line you write in your script (and shell scripts are interpreted while C++ is compiled). You might batch the writes instead and write once, for example

MSG="something"
logfile="test.txt"
(
for i in {1..10000}; do
        echo $MSG
done
) >> $logfile

Which will write the message 10k times but only open the log once.

Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
-3

Compiled vs. Interpreted Languages

Bash is interpreted while C++ is compiled. Just that makes it a lot faster

Reece Ward
  • 29
  • 1
  • 5
    Sometimes. And sometimes the interpreted language has nifty little instructions so tightly optimized that they blow expectations right out of the water. – user4581301 Jul 04 '17 at 18:38
  • 1
    @user4581301 Well, technically they are not interpreted at this point, but JIT/AOT-compiled ;) – iehrlich Jul 04 '17 at 18:44
  • 1
    No... Bash is interpreted, and yes they can be fast, but you still have to _interpret_ it is always going to be somewhat slower. You could compile it, but that is not what we are talking about – Reece Ward Jul 04 '17 at 18:47
  • 1
    @iehrlich even without jitting you sometimes run across "Holy Smurf!". Old matlab is a good example. The script is slow, but the code backing the script has some serious pep in it's step. – user4581301 Jul 04 '17 at 18:48
  • 6
    Interpreted/JIT/compiled isn't especially relevant in this case, since the I/O is the bottleneck. CPU usage is going to be sitting below 1% for the entire duration of the program, so it won't really matter that the C++ version is faster during that 1%. iehrlich's answer is right; the problem is that the bash script opens the file anew every time it prints a line, while the C++ version keeps it open until it's done. – Ray Jul 04 '17 at 23:05