UPD: thanks to Konrad and kqr for pointing out this answer only talks about C or C++-style compilation. There are other ways of doing it, like Common Lisp does, for example.
Strictly speaking, you cannot compile python program beforehand because you don't necessarily have the full source code at compile-time. A python program can download source code and put it through eval()
for all we know. Or construct it programmatically (in standard library it actually does exactly that in namedtuple()
).
This is not the biggest problem though - those are marginal practices. The biggest problem is that it is incredibly hard, probably impossible in general case to infer the data types beforehand. If you have a function max(x, y)
and you want to compile it to native code, you need to know what are the possible types for x
and y
, and compile a different version for each combination. That may be a problem. Now, you can restrict some features to make such inferring possible, and there you get RPython.
So, a python program can be compiled, but it hard to do beforehand and entirely.
That is why there is PyPy! PyPy is a JIT compiler. Instead of inferring, it runs code and analyzes it while it runs. That is why it only optimizes loops, actually. Here's how that works (VERY roughly):
- PyPy lets a loop run for time without interfering, while collecting data about its flow (currently for 1000 iterations)
- Based on the collected data types and flow, optimized assembler code is generated and compiled.
- 'Guards' are put in place to check if actual program flow corresponds to predicted.
- Native code is executed until a guard fires or loop ends.
Also, while developing PyPy, the devs created RPython, wich is a subset of Python that can, in fact, be fully and statically compiled. They achieved it mainly by enforcing early binding. For example, if you have a variable which is an integer, you cannot repurpose it as a char later down the line. Also, you cannot mix different datatypes in lists or other containers, and so on.