5

My understanding

C++ is compiled into machine code and executed.

Python is compiled into bytecode

This bytecode is then executed

What does this execution step entail and how is it different for Cpython and PyPy?

Where does the difference in performance kick in? Where does the fact that Python is dynamically typed kick in in terms of performance?

Thanks!

algorithmicCoder
  • 6,595
  • 20
  • 68
  • 117
  • 1
    That's a little broad ;-) Do you have something specific in mind that you can't find on the Cython or PyPy websites? – Raymond Hettinger Nov 03 '11 at 05:32
  • I think the two specific questions I asked are pretty much what I have in mind the websites do not necessarily compare and contrast with C++ .... why are those two questions too broad? – algorithmicCoder Nov 03 '11 at 05:36
  • Why votes to close? In my opinion, it's a perfectly legitimate question. It'll take some time to answer, but all good questions do. – atzz Nov 03 '11 at 05:53
  • @atzz It's overly broad. Entire CS courses are taught just to cover what's asked in this question. – agf Nov 03 '11 at 06:06
  • Basically, PyPy's performance advantage comes from it's JIT (Just-in-Time-Compiler) which analyzes the code as it runs and optimizes it. – agf Nov 03 '11 at 06:07
  • 2
    @agf - I feel it's a knee-jerk reaction on your part. These things can be explained just in several sentences, giving conceptual understanding and providing a starting point for further study. I feel questions of this type provide great value to SO. Or do we want SO to be just another HOWTO site? – atzz Nov 03 '11 at 06:56
  • @atzz The question has been open an hour, and there hasn't been a single answer. It's not a knee-jerk reaction; questions like this aren't a good fit for this site, unless they have more specific sub-questions that can be answered in a _useful_ way. I didn't downvote -- I don't think it's a _bad question_, it's just not right for SO. – agf Nov 03 '11 at 06:59
  • 1
    I'd say that the biggest problem with the question is that it's really asking three things. What is bytecode interpretation? What is JIT? How do those compare with regular compilation to machine language models? Any one of these questions would probably be too broad for SO, but all three at once is way out of bounds. – Nicol Bolas Nov 03 '11 at 07:05
  • 1
    @agf - no answers for an hour? so what? It takes time to formulate things clearly and briefly, especially if you consider the fact that most of us do not work fulltime on answering SO questions. If we close any non-trivial questions offhand, what's the point of SO at all? Everything else can be googled. – atzz Nov 03 '11 at 07:30
  • Related: [Is Python interpreted or compiled or both?](http://stackoverflow.com/questions/6889747/) – Piotr Dobrogost Oct 13 '12 at 19:46

1 Answers1

8

C, C++, and other statically compiled languages are compiled into native machine code, which means the CPU of the computer can directly execute them. The code it compiles to is unintelligible binary data, but you can kind of imagine that a C code fragment like this:

int x = 10;
int y = x * 2;

will be compiled into a series of binary instructions that mean something like the following:

store 10 into memory address 200
multiply the contents of memory address 200 by 2, treating them as integers
store the result of the last instruction into memory address 300

Where the compiler has assigned memory addresses to the variables x and y that appeared in the code. Note that actual machine code is more complex than this, and is obviously encoded into short binary codewords, not English phrases. But that's the very basic idea. A particular point to note is that the compiler knew to use integer multiplication because it knew that x and y are integers. The CPU itself knows nothing at all about the meaning of the contents of memory address 200, it just knows about bits and can be told to shuffle them around in various ways, one of which is integer multiplication.

Now Python is compiled to byte code. That actually doesn't mean much very interesting when we're talking about these issues. Python byte code, unlike machine code, doesn't encode low level operations that can be directly executed by the machine. In fact, it basically just encodes the very same Python-level operations you wrote in your Python source code, and the CPU can't do anything at all with Python byte code. The Python interpreter is a program that has the job of carrying out the instructions encoded in Python byte code. All the byte-code compilation does is allow the interpreter to operate on a form of the code that is easier and faster to manipulate; it doesn't have to do all the string processing necessary to understand Python source code directly.

So here's where dynamic typing and the performance difference come in. A C++ compiler that sees x * 2 knows it can compile this to a single integer multiplication instruction for the CPU, because it knows the types of everything involved ahead of time.

A Python interpreter that sees x * 2 has to go through many steps to see whether x is any of the built in types that support multiplication, or whether it is a class that implements custom multiplication, or whether it is a class that doesn't implement multiplication but inherits from something else that does, or whether it should create an exception. And if x is an integer there are then steps to get the machine-level value of x out of the data-structure that represents a Python integer, and then 1 single machine level instruction to actually have the CPU do integer multiplication, then more instructions to wrap the result back up into a Python integer data structure.

All of that code is many compiled machine code instructions (usually; for PyPy running on top of CPython they're Python byte code instructions!); the compiled code of the Python interpreter itself. You might think Python's byte code compiler could figure out which path to take ahead of time and translate the Python source code into those machine instructions, but it can't because Python is dynamically typed; x could be an integer the first time that line of code is executed, then a string next time, and a list the time after that, and maybe even one day a class instance. So all of that logic has to be done every single time, because Python can't know ahead of time what it's going to need. So even if you wrote a program that compiled Python source code into native machine code, mostly it would have to emit machine code that basically does the same thing as the Python interpreter.

That covers most of your questions, as a very simple overview. You also ask about PyPy, without really giving any details of what you're interested in. I presume it's "why is PyPy faster than CPython (some of the time)?" Basically PyPy has a JIT compiler, which is a bit like a C++ compiler except that it compiles code during the execution of your program. This can (sometimes) get around the problem of Python not being able to know whether x is an integer, a float, a list, or a something else. On any one execution of a bit of code, x is just one thing. And in most Python code, x is only ever one thing, or occasionally one of a few things. So by compiling code at runtime (after waiting to see which is the code that is executed really often), PyPy's JIT can (sometimes) turn x * 2 into a single integer multiplication machine code instruction. If we execute that line of code with x as an integer millions of times, this can be a big performance boost. But it's still possible that the next time x will be a string, so the JIT has to include some fallback logic so that it can still handle all the possibilities that Python allows. But it can gain speed by waiting to see which of the many possibilities are actually used often, and then optimising for those. A JIT can even make some optimisations that C++ compilers can't, because it can wait to see what is going on at runtime, whereas C++ has to emit code that will work whatever happens at runtime (but it can make assumptions based on the types, which will never change).

Ben
  • 68,572
  • 20
  • 126
  • 174
  • Heh. While I was being distracted by unimportant stuff like my job, you gave much more detailed answer than I was trying to write. :) I just wanted to add a link to CPython bytecode interpreter source as an illustration: http://hg.python.org/cpython/file/62fa61f2ee7d/Python/ceval.c#l970 – atzz Nov 03 '11 at 07:24
  • @atzz Looks like I got in bare minutes before the question was closed too! :O – Ben Nov 03 '11 at 07:47