53

I'm trying to understand the python compiler/interpreter process more clearly. Unfortunately, I have not taken a class in interpreters nor have I read much about them.

Basically, what I understand right now is that Python code from .py files is first compiled into python bytecode (which I assume are the .pyc files I see occasionally?). Next, the bytecode is compiled into machine code, a language the processor actually understands. Pretty much, I've read this thread Why python compile the source to bytecode before interpreting?

Could somebody give me a good explanation of the whole process keeping in mind that my knowledge of compilers/interpreters is almost non-existent? Or, if that's not possible, maybe give me some resources that give quick overviews of compilers/interpreters?

Thanks

Amal K
  • 4,359
  • 2
  • 22
  • 44
NickHalden
  • 1,469
  • 2
  • 20
  • 31
  • 3
    You do not "interpret into machine code" — that's what compilers do. Python interpreter just executes the bytecode. (And it's .pyc for bytecode.) – Piotr Kalinowski Jul 21 '10 at 13:26
  • 3
    On a side note, you might find helpful to know that the last modification time of the original .py file is encoded in the .pyc file. This allows Python to figure out if a new .pyc file needs to be created or not. The purpose of .pyc files is, of course, to avoid parsing the whole script each time the script is invoked. A Python program will not run faster if the .pyc is used. Only the loading time changes. – s.m. Jul 21 '10 at 13:46

2 Answers2

65

The bytecode is not actually interpreted to machine code, unless you are using some exotic implementation such as pypy.

Other than that, you have the description correct. The bytecode is loaded into the Python runtime and interpreted by a virtual machine, which is a piece of code that reads each instruction in the bytecode and executes whatever operation is indicated. You can see this bytecode with the dis module, as follows:

>>> def fib(n): return n if n < 2 else fib(n - 2) + fib(n - 1)
... 
>>> fib(10)
55
>>> import dis
>>> dis.dis(fib)
  1           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (2)
              6 COMPARE_OP               0 (<)
              9 JUMP_IF_FALSE            5 (to 17)
             12 POP_TOP             
             13 LOAD_FAST                0 (n)
             16 RETURN_VALUE        
        >>   17 POP_TOP             
             18 LOAD_GLOBAL              0 (fib)
             21 LOAD_FAST                0 (n)
             24 LOAD_CONST               1 (2)
             27 BINARY_SUBTRACT     
             28 CALL_FUNCTION            1
             31 LOAD_GLOBAL              0 (fib)
             34 LOAD_FAST                0 (n)
             37 LOAD_CONST               2 (1)
             40 BINARY_SUBTRACT     
             41 CALL_FUNCTION            1
             44 BINARY_ADD          
             45 RETURN_VALUE        
>>> 

Detailed explanation

It is quite important to understand that the above code is never executed by your CPU; nor is it ever converted into something that is (at least, not on the official C implementation of Python). The CPU executes the virtual machine code, which performs the work indicated by the bytecode instructions. When the interpreter wants to execute the fib function, it reads the instructions one at a time, and does what they tell it to do. It looks at the first instruction, LOAD_FAST 0, and thus grabs parameter 0 (the n passed to fib) from wherever parameters are held and pushes it onto the interpreter's stack (Python's interpreter is a stack machine). On reading the next instruction, LOAD_CONST 1, it grabs constant number 1 from a collection of constants owned by the function, which happens to be the number 2 in this case, and pushes that onto the stack. You can actually see these constants:

>>> fib.func_code.co_consts
(None, 2, 1)

The next instruction, COMPARE_OP 0, tells the interpreter to pop the two topmost stack elements and perform an inequality comparison between them, pushing the Boolean result back onto the stack. The fourth instruction determines, based on the Boolean value, whether to jump forward five instructions or continue on with the next instruction. All that verbiage explains the if n < 2 part of the conditional expression in fib. It will be a highly instructive exercise for you to tease out the meaning and behaviour of the rest of the fib bytecode. The only one, I'm not sure about is POP_TOP; I'm guessing JUMP_IF_FALSE is defined to leave its Boolean argument on the stack rather than popping it, so it has to be popped explicitly.

Even more instructive is to inspect the raw bytecode for fib thus:

>>> code = fib.func_code.co_code
>>> code
'|\x00\x00d\x01\x00j\x00\x00o\x05\x00\x01|\x00\x00S\x01t\x00\x00|\x00\x00d\x01\x00\x18\x83\x01\x00t\x00\x00|\x00\x00d\x02\x00\x18\x83\x01\x00\x17S'
>>> import opcode
>>> op = code[0]
>>> op
'|'
>>> op = ord(op)
>>> op
124
>>> opcode.opname[op]
'LOAD_FAST'
>>> 

Thus you can see that the first byte of the bytecode is the LOAD_FAST instruction. The next pair of bytes, '\x00\x00' (the number 0 in 16 bits) is the argument to LOAD_FAST, and tells the bytecode interpreter to load parameter 0 onto the stack.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
  • 1
    So if it's not turned into machine code... how is it finally executed by my x86 procesor? I was under the impression that everything that is happening on my computer could eventually be broken down into 1's and 0's that are being read by my processor or some other hardware. – NickHalden Jul 21 '10 at 15:45
  • @JGord: I've extended my answer to address your comment. – Marcelo Cantos Jul 21 '10 at 22:46
  • 1
    Ok, so the part that is still blowing my mind is that my processor does not understand the LOAD_FAST opcode correct? That is bytecode for the python virtual machine. So somehow the virtual machine was written in a language that could be assembled into x86. Sure. However, how does the virtual machine actually do the operations it interprets from the byte code on my hardware? Let's take an example. My script does some calculation and it needs to send the result over the bus to my graphics card. The python virtual machine cannot do that am I right? Somehow the physical proc is doing something? – NickHalden Jul 22 '10 at 13:13
  • 6
    The interpreter/VM is in C. It is (to oversimplify somewhat) a loop that uses the current byte to choose one of many cases in a huge switch statement. Somewhere in the middle of the switch, there is a `case LOAD_FAST:` followed by code that reads the next two bytes, looks up the specified parameter in some "parameters" collection, and pushes it onto a stack object. To interact with the outside world, Python allows calls to extension modules, which act like Python code and objects, but are really compiled code and can thus talk to graphics cards, etc., directly, on behalf of your script(s). – Marcelo Cantos Jul 22 '10 at 13:23
  • 8
    To be a bit more explicit about your last question: there is no Python opcode for "talk to the graphics card". There is an opcode for "call this function in this module", and if the module is a graphics programming extension module, the interpreter will call the library's entry point for the requested function, passing it some parameters. The C library (assuming it's C) teases out the parameters, converting them from Python objects into C values and structs, and forwards the call onto a bona-fide graphics library, which then plonks a colorful triangle on your screen, or whatever. – Marcelo Cantos Jul 22 '10 at 13:34
  • 1
    Wow, thanks Marcelo! That's exactly what I was looking for. Mind telling me how/where you learned that? Also, you sure the C library is where the parameters are 'teased out'? That seems counterintuitive to me. Why wouldn't the python interpreter which understands python bytecode (and thus I assume understands python objects) tease out the parameters and just send C variables as parameters? – NickHalden Jul 23 '10 at 12:58
  • Idle curiosity ... almost three decades worth of it (not all of it Python, of course). The Python interpreter doesn't know about the C data types required by arbitrary C libraries. So it essentially passes its own internal representation of the Python-object parameters to the library, which does what it wants with them. Have a read [here](http://docs.python.org/extending/extending.html) for a taste of how this works. – Marcelo Cantos Jul 23 '10 at 13:12
  • 1
    Hi Marcelo, despite reading your comments and post, I'm still confused about how Python byte code instructions get carried out by CPU. The VM can read Python byte code and carry out the instructions, but ultimately for the VM to carry out the instructions, they need to send those instructions to the CPU (right?). So isn't there any stage of compiling those instructions into CPU readable instructions? Thank you. – Moondra Jul 28 '17 at 16:29
  • 1
    @Moondra The VM is itself a piece of native code — written in C and compiled to machine code — that does the actual work. It walks over the bytecode and performs the operations it finds therein. Revisiting the LOAD_FAST example I gave earlier, the code may be found at https://github.com/python/cpython/blob/3.6/Python/ceval.c#L1274. It's fairly noisy, but the guts of it is `GETLOCAL(oparg)`, which fetches the parameter corresponding to the opcode's argument into a local C variable, and `PUSH(value)`, which pushes the parameter onto the Python stack. That is, essentially, LOAD_FAST. – Marcelo Cantos Jul 28 '17 at 23:59
  • @MarceloCantos Thank you for your reply. I had taken a look at the some of the bytecode and CPython code, but I only have experience with Python, and none with C so that's why I'm having so much trouble. "The VM is itself a piece of native code — written in C and compiled to machine code — that does the actual work" -- Do you mean all of the VM switch cases are precompiled to a machine language? Thus, when the VM chooses a switch case, and executes it, it actually directly communicating with the CPU's memory? Thus no need for C to compile once again. – Moondra Jul 29 '17 at 01:44
  • So far I understand that the bytecode gets translated into switch cases, which the VM executes. Essentially, the switch cases are representations of the bytecode instructions, but just written in C. But since I don't understand C at all, (still have a lot to learn about Python unfortunately), I don't understand what happens when you execute C code in the switches. I've read that C code first needs to be compiled to machine code before getting executed. Is that what we do here when we execute a switch case? – Moondra Jul 29 '17 at 01:44
  • Or Is everything pre-compiled to machine code, so making executions of switch cases yield results right away. Thank you so much for you patience. – Moondra Jul 29 '17 at 01:44
  • 3
    @Moondra CPython never translates bytecode into switch cases. The C code I linked to is the code that gets compiled into machine code. That machine code is a CPU-ready representation of the C code. You should think of the C code and the machine code as being different representations of exactly the same thing. The C is a human-readable form, while the machine code is a machine-readable form. A key point to understand is that the C program (in its compiled machine code form) is the only thing the CPU sees as code. – Marcelo Cantos Jul 29 '17 at 01:56
  • 2
    … The Python bytecode, in contrast, is seen by the CPU as just data. The C code _interprets_ that data as code to be executed, hence the name _interpreter_. – Marcelo Cantos Jul 29 '17 at 01:58
  • 3
    It might help to think of the Python bytecode as a cooking recipe, and the C code as a cooking robot that reads and follows recipes in order to cook food. The robot itself has code inside it, which could well be C code, and the recipes are just data read in through the robot's eyes in order to know how to execute a particular cooking procedure. At one level, the recipe is code — a set of instructions to follow. At another level, it's just data to be fed to the robot's brain. hth – Marcelo Cantos Jul 29 '17 at 02:06
  • Ah, sorry 'translate' was a poor choice of word. I meant to use interpret. So eventually the C code is getting compiled to machine code! That's where the confusing was stemming from. Now, as far I know, Python code can interpreted line by line -- when I use the interpreter I can run lines in real time, but we can't do that with C right? From what I'm understanding is that C has to compile the entire code to machine language in one shot. It has to interpret the entirety of the byte-code again (past lines and the new lines), before compiling it to machine code. – Moondra Jul 29 '17 at 18:25
  • So everytime we run Python code via the interpreter, are we forcing C to recompile the entire bytecode once again (the new lines as well as the older lines)? – Moondra Jul 29 '17 at 18:26
  • @Moondra The C code is compiled once to make the Python interpreter (python.exe on Windows, just python on most other platforms). That's the "cooking" robot. You then run the interpreter, passing it the Python code to run. The interpreter reads the code and "compiles" it into bytecode, which it then interprets. It does this every time you run that piece of code. – Marcelo Cantos Jul 29 '17 at 22:54
  • There is an optimization for imported modules. After compiling an imported .py file, the interpreter caches the bytecode as a .pyc file alongside the imported .py file. Whenever a module is imported, the interpreter first checks to see if there's a corresponding .pyc file that's newer than the .py file. If there is, it loads the .pyc file directly instead of recompiling the .py file. – Marcelo Cantos Jul 29 '17 at 22:59
  • @MarceloCantos If The C code is only needs to be compiled once (to run the interpreter) are we can run code as many times through the interpreter -- the interpreter seems to be essentially a virtual CPU specifically built to read Python bytecode. I guess that's why it's called a virtual machine. I think I got it. Thank you so much for you patience. These details are hard to find on the web, in a way that beginners can understand. Once again thank you so much. – Moondra Jul 30 '17 at 00:22
  • 1
    @Moondra That's spot on! Glad to help. – Marcelo Cantos Jul 31 '17 at 05:54
  • Hi, @MarceloCantos thanks for your great explanation! And though the `co_code` of function `fib` is `'|\x00\x00d\x01\x00j\x00\x00o\x05\x00\x01|\x00\x00S\x01t\x00\x00|\x00\x00d\x01\x00\x18\x83\x01\x00t\x00\x00|\x00\x00d\x02\x00\x18\x83\x01\x00\x17S'` the Python interpreter no need to import module `opcode` then translate like `|` to `LOAD_FAST` then do some operation, right? It just do the operation when it read `|`, the module `opcode` is just help people unstanding `co_code` of function, right? – roachsinai Apr 25 '19 at 04:57
  • @MarceloCantos Thanks for your reply, and I've asked a related question the link is https://stackoverflow.com/questions/55843979/whats-the-relationship-between-processors-call-stack-and-pythons-frame-object . Hope you will answer it when you have time. Thanks in advance. – roachsinai Apr 25 '19 at 14:20
  • In Python 3, use `__code__` instead of `func_code` on a function, e.g. `fib.__code__.co_code`. – natka_m Nov 22 '20 at 17:15
7

To complete the great Marcelo Cantos's answer, here is just a small column-by-column summary to explain the output of disassembled bytecode.

For example, given this function:

def f(num):
    if num == 42:
        return True
    return False

This may be disassembled into (Python 3.6):

(1)|(2)|(3)|(4)|          (5)         |(6)|  (7)
---|---|---|---|----------------------|---|-------
  2|   |   |  0|LOAD_FAST             |  0|(num)
   |-->|   |  2|LOAD_CONST            |  1|(42)
   |   |   |  4|COMPARE_OP            |  2|(==)
   |   |   |  6|POP_JUMP_IF_FALSE     | 12|
   |   |   |   |                      |   |
  3|   |   |  8|LOAD_CONST            |  2|(True)
   |   |   | 10|RETURN_VALUE          |   |
   |   |   |   |                      |   |
  4|   |>> | 12|LOAD_CONST            |  3|(False)
   |   |   | 14|RETURN_VALUE          |   |

Each column has a specific purpose:

  1. The corresponding line number in the source code
  2. Optionally indicates the current instruction executed (when the bytecode comes from a frame object for example)
  3. A label which denotes a possible JUMP from an earlier instruction to this one
  4. The address in the bytecode which corresponds to the byte index (those are multiples of 2 because Python 3.6 use 2 bytes for each instruction, while it could vary in previous versions)
  5. The instruction name (also called opname), each one is briefly explained in the dis module and their implementation can be found in ceval.c (the core loop of CPython)
  6. The argument (if any) of the instruction which is used internally by Python to fetch some constants or variables, manage the stack, jump to a specific instruction, etc.
  7. The human-friendly interpretation of the instruction argument
Delgan
  • 18,571
  • 11
  • 90
  • 141
  • How to read the implementation speed from it? There's no timing. – AbstProcDo Dec 15 '17 at 09:04
  • @YumiTada What do you mean by "implementation speed"? This is just compiled bytecode and not (yet) executed, so timing is irrelevant here. – Delgan Dec 15 '17 at 10:05