Why does trivial loop in python run so much slower than the same in C++? And how to optimize that?

Question

simply run a near empty for loop in python and in C++ (as following), the speed are very different, the python is more than a hundred times slower.

a = 0
for i in xrange(large_const):
  a += 1

int a = 0;
for (int i = 0; i < large_const; i++)
  a += 1;

Plus, what can I do to optimize the speed of python?

(Addition: I made a bad example here in the first version of this question, I don't really mean that a=1 so that C/C++ compiler could optimize that, I mean the loop itself consumed a lot of resource (maybe I should use a+=1 as example).. And what I mean by how to optimize is that if the for loop is just like a += 1 that simple, how could it be run in the similar speed as C/C++? In my practice, I used Numpy so I can't use pypy anymore (for now), is there some general methods for making loop far more quickly (such as generator in generating list)? )

Related : http://stackoverflow.com/questions/3033329/why-are-python-programs-often-slower-than-the-equivalent-program-written-in-c-or — Ashwini Chaudhary, Jun 03 '13 at 14:39
Try `volatile int a;` in the C version, to prevent the loop being removed. — Mike Seymour, Jun 03 '13 at 14:59
tried volatile, the same in a += 1, hundreds of times faster than python... — chentingpc, Jun 06 '13 at 03:27

mgilson · Accepted Answer · 2013-06-03T15:01:18.330

15

A smart C compiler can probably optimize your loop away by recognizing that at the end, a will always be 1. Python can't do that because when iterating over xrange, it needs to call __next__ on the xrange object until it raises StopIteration. python can't know if __next__ will have side-effect until it calls it, so there is no way to optimize the loop away. The take-away message from this paragraph is that it is MUCH HARDER to optimize a Python "compiler" than a C compiler because python is such a dynamic language and requires the compiler to know how the object will behave in certain circumstances. In C, that's much easier because C knows exactly what type every object is ahead of time.

Of course, compiler aside, python needs to do a lot more work. In C, you're working with base types using operations supported in hardware instructions. In python, the interpreter is interpreting the byte-code one line at a time in software. Clearly that is going to take longer than machine level instructions. And the data model (e.g. calling __next__ over and over again) can also lead to a lot of function calls which the C doesn't need to do. Of course, python does this stuff to make it much more flexible than you can have in a compiled language.

The typical way to speed up python code is to use libraries or intrinsic functions which provide a high level interface to low-level compiled code. scipy and numpy are excellent examples this kind of library. Other things you can look into are using pypy which includes a JIT compiler -- you probably won't reach native speeds, but it'll probably beat Cpython (the most common implementation), or writing extensions in C/fortran using the Cpython-API, cython or f2py for performance critical sections of code.

edited Jun 03 '13 at 15:01

answered Jun 03 '13 at 14:41

mgilson

300,191
65
633
696

The C code will be significantly faster even if compiled completely unoptimized. Because python is a higher level language, there's a lot of computational and representational baggage in python that isn't present in C. A loop void of content exposes a lot of that baggage. – David Hammen Jun 03 '13 at 14:44
@DavidHammen -- I thought I explained that in my second paragraph ... :) – mgilson Jun 03 '13 at 14:45
I made a bad example here, I don't really mean that a=1 so that compiler could optimize that, I mean the loop itself consumed a lot of resource(maybe I should use a+=1 as example).. And what I mean by optimize is that if the for loop is just that simple, how could it be run in the similar speed as C/C++? In my practice, I used numpy so I can't use pypy anymore, is there some general methods for making loop far more quickly (such as generator in generating list)? – chentingpc Jun 06 '13 at 03:24
@chentingpc -- As I stated in the second paragraph, a for loop in python does a lot of stuff that the C/C++ compiler doesn't. The extra work/time makes it so that python objects can be iterable -- Something **very useful** that you don't get in C (I'm not sure about C++). Unfortunately, you need to pay for the convenience in performance. As far as optimizing your loop, there's no way to do that in general. `numpy` provides a lot of things which could make it so the loop is pushed into C code, but without seeing the code you want to optimize, we can't help any more than this. – mgilson Jun 06 '13 at 12:28
@mgilson thank you for your great answers, I have learned a lot. Do you have any books recommended (something like how to use python more efficiently, or how to use python to deal with big data, etc)? – chentingpc Jun 06 '13 at 15:44
@chentingpc -- almost everything I know, I learned from stack overflow and reading the documentation ... – mgilson Jun 06 '13 at 15:45
@mgilson oh... Maybe you could write a book about that :D The way you learn python is a good method, but a great book may save a lot of time for beginner I guess.. – chentingpc Jun 06 '13 at 15:59

kirelagin · Answer 2 · 2013-06-03T14:46:42.383

Simply because Python is a more high level language and has to do more different things on every iteration (like acquiring locks, resolving variables etc.)

“How to optimise” is a very vague question. There is no “general” way to optimise any Python program (everythng possible was already done by the developers of Python). Your particular example can be optimsed this way:

a = 1

That's what any C compiler will do, by the way.

If your program works with numeric data, then using numpy and its vectorised routines often gives you a great performance boost, as it does everything in pure C (using C loops, not Python ones) and doesn't have to take interpreter lock and all this stuff.

oh, I made a bad example here... In my experiment, I use a += 1, python is still far more slower than C/C++. I knew python is high level lang, but I dont know why exactly this code will run slower here? — chentingpc, Jun 06 '13 at 03:19

score 0 · Answer 3 · answered Jun 03 '13 at 14:39

0

Python is (usually) an interpreted language, meaning that the script has to be read line-by-line at runtime and its instructions compiled into usable bytecode at that point.

C is (usually) a compiled language, so by the time you're running it you're working with pure machine code.

Python will never be as fast as C, for that reason.

Edit: In fact, python compiles INTO C code at run time, that's why you get those .pyc files.

answered Jun 03 '13 at 14:39

Glitch Desire

14,632
7
43
55

4

That's not correct, Python code is compiled to bytecode when it's first read, so it has no direct effect on performance of `for`-loops. – kirelagin Jun 03 '13 at 14:41
1

How C is **usually** a compiled language? Isn't it always compiled language? – tobi Jun 03 '13 at 14:44
4

@tobi - Take a look at [CINT](http://root.cern.ch/drupal/content/cint). There are other C/C++ interpreters too. – Glitch Desire Jun 03 '13 at 14:47
@tobi - in current implementations, yes, although there's no reason you couldn't (if you were really, really, really bored) make a C interpreter. – slugonamission Jun 03 '13 at 14:47
1

See question: Is there an interpreter for C http://stackoverflow.com/questions/584714/is-there-an-interpreter-for-c – Matthieu Rouget Jun 03 '13 at 14:49
2

`.pyc` files contain bytecode for the VM included in CPython. No Python implementation compiles Python to C at run time. A few Python-derived languages (Cython, Shedskin, etc.) are compiled to C or C++, and Nuitka compiles Python to C++, but they're 1. obscure, 2. not Python (with the exception of Nuitka) and 2. compile *ahead of time*, not at run time. – Jun 03 '13 at 14:53

score 0 · Answer 4 · edited May 23 '17 at 12:02

0

As you go more abstract the speed will go down. The fastest code is assembly code which is written directly.

Read this question Why are Python Programs often slower than the Equivalent Program Written in C or C++?

edited May 23 '17 at 12:02

Community

1
1

answered Jun 03 '13 at 14:46

Arpit

12,767
3
27
40

1

Assuming the abstraction can't be eliminated. For example, many abstractions in a C++ programs can be dismantled at compile time. Python's abstractions are entirely late-bound and thus much harder to get rid of (though not impossible, see PyPy). And no, directly writing assembly code is not faster -- C compilers are better at writing assembly than you, me, and most other people for 90% of code. – Jun 03 '13 at 14:56
agree. they can write better assembly then me but not the best. simple loop of assembly will execute much faster then C/C++. On the contrary side as the program go big the assembly goes bad. So it's all depend how much complex the programme is. And yes development time is much precious then run time. :) That's why we use python. – Arpit Jun 03 '13 at 15:02
That's not what I meant. Even if you put in the effort of writing good assembly code, for most tasks (one notable exceptions: bytecode interpreters, ask Mike Pall) the result won't beat the compiler's output the the equivalent C or C++ code. Do *you* know the latencies of various instructions? Do *you* know how to get the most out of out-of-order execution and the pipeline? Do *you* allocate registers close-to-optimally? I could go on all day. You and I can only hope to beat the compiler if we know a lot it doesn't know (or can't assume). And that's very rare. – Jun 03 '13 at 15:08
I'm totally agree with you that i can't beat the compiler in writing assembly and no one.But what i'm saying is if we compare the code generated by the compiler is faster then the same code written by us in c.If we compare the best C and Best assembly then assembly will win. The only problem is it's easy to write best c but not the best assembly(no one beat compiler.) – Arpit Jun 03 '13 at 15:15

Why does trivial loop in python run so much slower than the same in C++? And how to optimize that?

4 Answers4