7

Why can't pure Python be fully compiled? Compiled or interpreted is a trait of the implementation, not the language. So shouldn't there be some Python implementation that is fully before-hand compiled to native code? What makes (pure) Python so difficult to compile?

I know there are things like PyPy and Cython but as I understand it those are not pure Python and require things like type annotations, etc.

thanks

Fully compiled meaning compiled before-hand to native code, like C or C++ or Lisp.

Aristides
  • 3,805
  • 7
  • 34
  • 41

3 Answers3

6

False premise. Python can be fully compiled, no type annotations or anything of the sort are necessary.

Furthermore, PyPy does fully compile Python code into machine code. That this isn’t done ahead of time is irrelevant for the aspect of compilability – it’s just an implementation detail of the JIT architecture.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
5

UPD: thanks to Konrad and kqr for pointing out this answer only talks about C or C++-style compilation. There are other ways of doing it, like Common Lisp does, for example.

Strictly speaking, you cannot compile python program beforehand because you don't necessarily have the full source code at compile-time. A python program can download source code and put it through eval() for all we know. Or construct it programmatically (in standard library it actually does exactly that in namedtuple()).

This is not the biggest problem though - those are marginal practices. The biggest problem is that it is incredibly hard, probably impossible in general case to infer the data types beforehand. If you have a function max(x, y) and you want to compile it to native code, you need to know what are the possible types for x and y, and compile a different version for each combination. That may be a problem. Now, you can restrict some features to make such inferring possible, and there you get RPython.

So, a python program can be compiled, but it hard to do beforehand and entirely.

That is why there is PyPy! PyPy is a JIT compiler. Instead of inferring, it runs code and analyzes it while it runs. That is why it only optimizes loops, actually. Here's how that works (VERY roughly):

  1. PyPy lets a loop run for time without interfering, while collecting data about its flow (currently for 1000 iterations)
  2. Based on the collected data types and flow, optimized assembler code is generated and compiled.
  3. 'Guards' are put in place to check if actual program flow corresponds to predicted.
  4. Native code is executed until a guard fires or loop ends.

Also, while developing PyPy, the devs created RPython, wich is a subset of Python that can, in fact, be fully and statically compiled. They achieved it mainly by enforcing early binding. For example, if you have a variable which is an integer, you cannot repurpose it as a char later down the line. Also, you cannot mix different datatypes in lists or other containers, and so on.

thule
  • 4,034
  • 21
  • 31
  • 1
    A Common Lisp program can also contain calls to `eval` or `(max x y)` and yet it is not unheard of to compile those. – kqr Sep 01 '13 at 11:45
  • Your argument isn’t carefully constructed. As kqr said, nothing prevents languages containing `eval` from being compiled. Even C++ programs could contain the moral equivalent of an `eval` function. You also don’t need to have full type information in order to compile code. In particular, Python uses dynamic lookup anyway, and compiled C libraries for Python do the same, at runtime. PyPy of course goes a step further but that’s not the only way of performing compilation. – Konrad Rudolph Sep 01 '13 at 12:02
  • Sure, you can always 'compile' code by putting the virtual machine in the executable, the way lisp does it. This is not what I was talking about. I was addressing the issue of translating python code into assembly similarly to C or C++. – thule Sep 01 '13 at 12:16
  • @letitbee As was I. These ways aren’t mutually exclusive, as the fully compiled C libraries for Python clearly show. In the same way, Python can be fully ahead-of-time compiled into assembly. – Konrad Rudolph Sep 01 '13 at 13:22
  • given the probable large gains in speed, why hasn't somebody created a compiler for python to assembly? my question was *why would it be harder for Python than any other language?* – Aristides Sep 01 '13 at 15:30
  • @Aristides That is what I was trying to answer. But you have to consider it is hardly 'any other language'. Java and .Net don't do that either, they have JITs in default implementation, and they are mostly statically typed. – thule Sep 01 '13 at 17:43
  • Yeah, but Java *could* be compiled to native code, because it does not have many dynamic features, could it not? – Aristides Sep 01 '13 at 20:20
  • Yes, I believe both Java and .net can be compiled ahead of time, but it is actually slower than JIT in many cases and may impose some minor restrictions on the language. – thule Sep 01 '13 at 21:17
  • Smalltalk/V, for example, compiled to native 80286 machine code - and Smalltalk is much more object-oriented than Python (or Java, lol). The "V" in Smalltalk/V is, btw, not a roman numeral but stands for Virtual Memory (yes, on a 286!). Smalltalk/V was a great Smalltalk implementation...so it was bought by a competitor and killed off. Nowadays, I guess every Smalltalk implementation compiles to bytecode (which a JIT compiler can then compile to machine code; Smalltalk-80 was, in the 1980s, pioneering JIT compilation). – Klaws May 17 '20 at 09:36
4

I think the biggest issue (and the reason that those implementations require type annotations) is that the Python specification relies heavily on deferring semantic evaluation like function binding to execution time.

Even if typing can be fully inferred from the script in some cases, it would be extremely complex to do that for the general case, and code that relies on deferred binding would, as Magnus Hoff points out in the comments, require embedding what amounts to an interpreter in the resulting executable.

Edit: I'm answering the implied secondary question about why someone hasn't tackled this problem head-on, not agreeing with the idea that it's somehow impossible. The C++ runtime, for example, does a lot of deferred binding, but my sense is that Python does more and does it later.

Mark R. Wilkins
  • 1,282
  • 7
  • 15
  • 1
    "Embedding what amounts to an interpreter in the resulting executable" is what many compiled high-level language implementations do, e.g. Common Lisp or Haskell. These are more generally referred to as run-time systems. A Haskell executable is the actual program embedded in an RTS that manages memory, profiling, parallellism and stuff like that. Taken to the extreme, that RTS could do some interpretation as well. – kqr Sep 01 '13 at 11:44
  • 1
    Sure, absolutely. At some point, pushing enough of a language to runtime evaluation reduces the difference between "compilation" and "interpretation" to zero. – Mark R. Wilkins Sep 01 '13 at 11:49