18

I'm using Windows7 using CPython for python3.22 and MinGW's g++.exe for C++ (which means I use the libstdc++ as the runtime library). I wrote two simple programs to compare their speed.

Python:

x=0
while x!=1000000:
    x+=1
    print(x)

C++:

#include <iostream>
int main()
{
    int x = 0;
    while ( x != 1000000 )
    {
        x++;
        std::cout << x << std::endl;
    }
    return 0;
}

Both not optimized.

I ran c++ first, then i ran python through the interactive command line, which is much slower than directly starting a .py file.

However, python outran c++ and turned out to be more than twice as fast. Python took 53 seconds, c++ took 1 minute and 54 seconds.

Is it because python has some special optimization done to the interpreter or is it because C++ has to refer to and std which slows it down and makes it take up ram?
Or is it some other reason?

Edit: I tried again, with \n instead of std::endl, and compiling with the -O3 flag, this time it took 1 min to reach 500,000.

UpAndAdam
  • 4,515
  • 3
  • 28
  • 46
busukxuan
  • 307
  • 2
  • 3
  • 9
  • 27
    So according to your benchmark, an infinite loop in Python runs twice as fast as printing a single number in C++? That's indeed strange. – Sven Marnach Mar 13 '12 at 16:46
  • 4
    Ahm, these code samples do entirely different things... The first one doesn't even terminate. – Niklas B. Mar 13 '12 at 16:46
  • 1
    There was a post on this literally just yesterday... Is this homework from a class or something? Short answer, C++ buffers on stdout by default. Python does not. – Swiss Mar 13 '12 at 16:46
  • 1
    Did you mean to include a `for` or `while` loop in the C++ code? – dan04 Mar 13 '12 at 16:47
  • 1
    @KevinDTimm: Though the title is the same, the two questions are completely 100% different. This is not a duplicate of that question. – Mooing Duck Mar 13 '12 at 16:49
  • 1
    @kindall: so does `cout`, unless you `cout << endl`. – Mooing Duck Mar 13 '12 at 16:49
  • 2
    @Mooing Duck: I agree, this should be reopened if OP changes the examples to be actually equivalent. I too think that this could be solved by using `\n` instead of `std::endl` in the C++ sample. – Niklas B. Mar 13 '12 at 16:50
  • The question should be, should I be seriously worrying about this. This isn't a standardized test and why would you be serious about the results ? – DumbCoder Mar 13 '12 at 16:50
  • 4
    [What is the C++ iostream endl fiasco?](http://stackoverflow.com/q/5492380/636019) – ildjarn Mar 13 '12 at 16:51
  • @DumbCoder: The point is that it's not immediately clear why they perform differently, so I think it's worth a question. If you don't worry about it, that's your personal attitude. – Niklas B. Mar 13 '12 at 16:52
  • oh i did need a while loop in the C++ code...it should be while(true){ x++; std::cout< – busukxuan Mar 13 '12 at 16:53
  • 1
    @Niklas B - My point was running a loop and then coming down to results isn't a test by which you decide which is faster. There are maybe loads of other stuff to explore on that part then, which would mean dissecting every feature and what would be the final benefit or outcome of that ? – DumbCoder Mar 13 '12 at 16:54
  • yeah i'm sure c++ is better, but i cant solve this problem... – busukxuan Mar 13 '12 at 16:55
  • 1
    @DumbCoder: I still disagree. We don't need to discuss whether it's a helpful performance test or not, but it's still surprising that Python outperforms C++ here, unless you know what's going on under the hood. People like dissecting here, usually just out of curiosity, not because it has some special benefit. – Niklas B. Mar 13 '12 at 16:56
  • 2
    @busukxuan: Edit the question to reflect this. Also, you can't compare the runtime of two infinite loops, so please make those loops terminate after the same number of iterations. – Niklas B. Mar 13 '12 at 16:57
  • 2
    @Niklas : Whether or not the loop is infinite in C++ is undefined. ;-] – ildjarn Mar 13 '12 at 16:59
  • 1
    @ildjarn: Hm, one could solve that by using `unsigned int`, I guess. Still doesn't change the situation much, because it should not terminate on any C++ implementation that I know of :P – Niklas B. Mar 13 '12 at 17:01
  • @Niklas : Yeah, a tangential comment seemed somehow appropriate under this question though. :-P – ildjarn Mar 13 '12 at 17:02
  • guess what...i set the limit to 10^6, python took 53 seconds, c++ took 1minute and 54 seconds. – busukxuan Mar 13 '12 at 17:07
  • @NiklasB. what do you think about the new result? – busukxuan Mar 13 '12 at 17:14
  • 1
    @busukxuan : Are you still using `endl`? If so, try actually _reading_ some of these comments (then use `'\n'` instead). – ildjarn Mar 13 '12 at 17:23
  • @ildjarn i read that and did a test again, this time it's exactly 2 minutes...slower than "endl" should only be a parallax, but it's still slower than python...then i even added a self-timing for the python program with time.clock function, and it took only half minute longer, still about 30 secs faster than c++ – busukxuan Mar 13 '12 at 17:27
  • 1
    @busukxuan: I don't see any limit there, please add it to the question (always do that!). I also can't reproduce it, even with `std::endl`, the C++ version is a lot faster on my machine. Which version of Python are you using? Also, are you using a Python optimizer? – Niklas B. Mar 13 '12 at 17:30
  • 1
    i was using python3 in the first test, and using python2(default) on the second test where i added the self-timing thing...i tested the self-timed one again with python3, and it took 1 minute 12 secs, and i'm using non-optimized source code, not byecode. – busukxuan Mar 13 '12 at 17:32
  • 1
    Now the picture becomes clear. Python is teh way to go. Somebody should step in and finally codez a c++ interpreter in phyton. – Gunther Piez Mar 13 '12 at 17:37
  • 1
    and this also proves that python3's print is faster than python2's print – busukxuan Mar 13 '12 at 17:40
  • 5
    @busukxuan: Could you know *PLEASE* edit all this extra info into the question? Maybe it will get reopened then. What you should add: C++ compiler, architecture, operating system, Python implementation/version, *and especially code that actually terminates*! Make this an interesting question and people will bother. Also, format the C++ code properly. – Niklas B. Mar 13 '12 at 17:40
  • ok i'm on it(i was using the old format) – busukxuan Mar 13 '12 at 17:44
  • 1
    What command line are you using to build? You're enabling optimizations, right? – ildjarn Mar 13 '12 at 17:50
  • nono, i'm using the python interpreter interactive session – busukxuan Mar 13 '12 at 17:55
  • @busukxuan: If you look at the answer, you'll notice that you forgot to mention something in the question. – Niklas B. Mar 13 '12 at 18:01
  • Can someone just try printf? And also try Visual C++. – quantum Mar 13 '12 at 21:04

5 Answers5

49

One of my colleague at work told me that Python code is faster than C++ code and then showed this topic as an example to prove his point. It is now obvious from other answers that what is wrong with the C++ code posted in the question. I still would like to summarize my benchmarks which I did in order to show him how fast a good C++ code can be!

There are two problems with the original C++ code:

  • It uses std::endl to print a newline in each iteration. That is a very bad idea because std::endl does more stuff than simply printing a newline — it also forces the stream to flush the buffer accumulated so far; flushing is an expensive operation as it has to deal with hardware – the output device. So the first fix is this: if you want to print a newline, just use '\n'.

  • The second problem is less obvious as it is not seen in the code. It is in the design of C++ streams. By default, C++ streams are synchronized to the C streams after each input and output operation so that your application could mix std::cout and std::printf, and std::cin and std::scanf without any problem. This feature (yes, it is a feature) is not needed in this case so we can disable this, as it has a little runtime overhead (that is not a problem; it doesn't make C++ bad; it is simply a price for the feature). So the second fix is this: std::cout::sync_with_stdio(false);

And here is the final optimized code:

#include <iostream>

int main()
{
    std::ios_base::sync_with_stdio(false); 

    int x = 0;
    while ( x != 1000000 )
    {
         ++x;
         std::cout << x << '\n';
    }
}

And compile this with -O3 flags and run (and measure) as:

$ g++ benchmark.cpp -O3    #compilation
$ time ./a.out             #run

//..

real   0m32.175s
user   0m0.088s
sys    0m0.396s

And run and measure python code (posted in the question):

$ time ./benchmark.py

//...

real  0m35.714s
user  0m3.048s
sys   0m4.456s

The user and sys time tell us which one is fast, and by what order.

Hope that helps you to remove your doubts. :-)

Nawaz
  • 353,942
  • 115
  • 666
  • 851
19

There isn't anything obvious here. Since Python's written in C, it must use something like printf to implement print. C++ I/O Streams, like cout, are usually implemented in a way that's much slower than printf. If you want to put C++ on a better footing, you can try changing to:

#include <cstdio>
int main()
{
    int x=0;
    while(x!=1000000)
    {
        ++x;
        std::printf("%d\n", x);
    }
    return 0;
}

I did change to using ++x instead of x++. Years ago people thought that this was a worthwhile "optimization." I will have a heart attack if that change makes any difference in your program's performance (OTOH, I am positive that using std::printf will make a huge difference in runtime performance). Instead, I made the change simply because you aren't paying attention to what the value of x was before you incremented it, so I think it's useful to say that in code.

Max Lybbert
  • 19,717
  • 4
  • 46
  • 69
  • 2
    so you're saying that c++ is only slower than python because python is using the most direct way to implement print, and that the difference is caused by the print thing? Hmm... I think i should go try it without the print. – busukxuan Mar 13 '12 at 18:43
  • 5
    Oh god, thanks for the help.... I removed the print and cout, and python took more than 10 seconds to count to 10^8, and C++ done that in a flash(not a metaphor or hyperbole, i do mean a fraction of a second, yes, a real flash). So the problem is the cout... Now I learned a great lesson. Thank you so much! – busukxuan Mar 13 '12 at 18:53
  • 2
    @busukxuan: Writing to the console will always be slow. Have you tried redirecting the output of each program to a file instead? Your little programs are almost certainly I/O bound and writing to a file would eliminate the slow console writes. – Blastfurnace Mar 13 '12 at 18:53
  • 10
    @bosukxuan: The C++ compiler probably removes the whole loop if you don't output anything... That's no proof that `cout` is indeed the problem here. – Niklas B. Mar 13 '12 at 19:53
11

I think we need more information, but I would expect you're building an un-optimized build of C++. Try building it with the -O3 flag. (someone who knows GCC better will have more and better recommendations). However, here's some timings from a completely untrustworthy source: http://ideone.com. I ran each 5 times to get some measure of variance on the timing, but only the origonal C++ varied, and not much at that.

Python: http://ideone.com/WBWB9 time: 0.07-0.07s
Your C++: http://ideone.com/tzwQJ time: 0.05-0.06s
Modified C++: http://ideone.com/pXJo3 time: 0.00s-0.00s

As for why my C++ was faster than yours, std::endl forces C++ to flush the buffer immediately. '\n' does the newline without the forced buffer flush, which is much much much faster.

(note: I only ran to 12773, since ideone.com kills processes after they display a certain amount of output, that was the most the server would give me)

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
  • i tried again, with \n, and -O3, this time it took 1 min to reach 500,000 – busukxuan Mar 13 '12 at 18:10
  • In that case, I only have one theory. Could you open up cmd.exe and test both of them in _that_ console, to be sure it's not the console that's different? Because the timings _should_ be almost identical for both Python and C++ for your test. This is because displaying stuff on the console takes a _lot_ of processor time that should be identical for both languages. The loop is nothing compared to that. – Mooing Duck Mar 13 '12 at 18:23
  • still the same result...T.T Does the CPU affect the result? I heard that AMD processors are much faster where Intel processors are slower but can handle many tasks at a time – busukxuan Mar 13 '12 at 18:30
  • @busukxuan: no, CPU shouldn't affect the result. Your python really put all those numbers in the console twice as fast as C++? – Mooing Duck Mar 13 '12 at 18:34
  • yeah...somehow...i though python can only be faster than .NET and Java if highly optimized, but this is unbelievable... Does the runtime library effect c++? I'm using libstdc++, from the latest version of MinGW. Is the visual C++ library faster? Btw another weird behavior is that using the same python program above, my PyPy is slower than CPython – busukxuan Mar 13 '12 at 18:39
  • The runtime library shouldn't make a huge difference, but give it a try. Can you edit your question with a little chart of "language-compiler-consoletype-time" for each of your tests? – Mooing Duck Mar 13 '12 at 18:43
  • Oh i think the problem is solved...Miraculously.... I removed the print and cout, and python took more than 10 seconds to count to 10^8, and C++ done that in a flash(not a metaphor or hyperbole, i do mean a fraction of a second, yes, a real flash). So the problem is the cout... Now I learned a great lesson. – busukxuan Mar 13 '12 at 18:50
  • 4
    @busukxuan: wrong. If you have a loop with no output, the C++ will do _nothing at all_. It didn't count _anything_. That's why benchmarks are hard. – Mooing Duck Mar 13 '12 at 19:24
  • @MooingDuck Nono, I added a cout after the loop(after the loop terminates) so that the program still need the variable x. And I didn't optimize this one because i was expecting that this will be faster than python without optimization. – busukxuan Mar 14 '12 at 08:50
  • @busukxuan: still might not have counted: It might just be printing the last value. The only way to be sure is to check the assembly. – Mooing Duck Mar 14 '12 at 14:58
  • @MooingDuck but it has to count to the last value first before it determines what the number is, right? Besides i didn't optimize that so it shouldn't have skipped any step. – busukxuan Mar 14 '12 at 21:05
  • @busukxuan: did you run Python under the same circumstances? I bet it's crazy fast too. – Mooing Duck Mar 14 '12 at 21:15
  • @MooingDuck i did, it definately took longer. more than a second i think – busukxuan Mar 15 '12 at 21:20
6

std::endl lags (it does more than just write a newline).

Using '\n' directly will make your C++ code run faster.

Xiddoc
  • 3,369
  • 3
  • 11
  • 37
quantum
  • 3,672
  • 29
  • 51
4

Same problem as posed in Why is reading lines from stdin much slower in C++ than Python? but in the opposite direction.

add

std::cout.sync_with_stdio(false);

to the top of the program

Community
  • 1
  • 1
gjpc
  • 1,428
  • 14
  • 21