When creating new programming languages, do you lose performance?

Question

This is proving to be a very difficult question for me to figure out how to properly ask.

For example, the Python interpreter is written in C. Say you wrote another interpreter in Python, that got compiled through CPython. And you then intend to run a program through your interpreter. Does that code have to go through your interpreter and then CPython? Or would the new interpreter be self-contained and not require CPython to interpret it's output. Because if it is not a standalone interpreter, then there would be a performance loss, in which case it would seem like all compilers would be written in low level languages until the end of time

EDIT: PyPy was the very thing that got me wondering about this topic. I was under the impression that it was an interpreter, written in Python, and I did not understand how it could be faster than CPython. I also do not understand how bytecode gets executed by the machine without being translated to machine code, though I suppose that is another topic.

Sane people building non-prototype projects don't write interpreters in interpreted languages. Instead, one picks a target -- such as Python bytecode -- and compiles to that. If you look at your faster template languages implemented in Python, that's actually how they work in practice; they compile down to Python bytecode, either directly or by generating and then evaluating Python source (or, better, AST structures). — Charles Duffy, Sep 16 '14 at 22:26
...of course, one could write something that compiles to a non-Python platform _in_ Python as well. Look at how ClojureScript's compiler is written in Clojure but generates JavaScript source (for further compilation via a javascript-to-javascript optimizing compiler) for an example of how things can be decoupled. — Charles Duffy, Sep 16 '14 at 22:29
Look at [PyPy](http://pypy.org/). It's a Python interpreter written in Python that runs under… well, usually under PyPy itself. By using a JIT custom-designed for Python-like languages, it's usually significantly faster than the usual CPython interpreter, which is written in C but does virtually no runtime optimization, or Jython and IronPython, which are written in Java and C# but rely on generic JVM and .NET JITs. And that's without even getting into non-interpreting compilers, where the hosting language isn't relevant. — abarnert, Sep 16 '14 at 22:38
@abarnert, PyPy is written in RPython -- a restricted subset of the Python language which can be compiled to C. Thus, it is not itself interpreted, and thus quite emphatically does _not_ provide an existence proof of an interpreted interpreter which runs faster than its parent platform. — Charles Duffy, Sep 16 '14 at 22:39
Answering the question specifically about PyPy, then -- PyPy can be faster than CPython because it doesn't run on CPython at runtime, but is compiled to C and then to native code. Modern JIT runtimes are indeed a bigger discussion. — Charles Duffy, Sep 16 '14 at 22:52
The right answer to the question the OP really wants to ask is "modern JIT runtimes". Or, more generally, (static and dynamic) optimization. CPython does a small amount of static optimization and a few tiny bits of special-cased dynamic optimization; PyPy (usually) automatically finds the hotspots and compiles them to optimized machine code on the fly; and that's why PyPy is (often) faster. (But I'll bet there's already a dup of that question here which covers this a lot better than a series of comments trying to puzzle out the real question from this one can…) — abarnert, Sep 16 '14 at 22:55
@abarnert and OP, here's one of the more interesting discussions of Python and its various intermediate representations that I've found on SO: http://stackoverflow.com/a/2998544/20789 — Dan Lenski, Sep 16 '14 at 23:09
It is all starting to make more sense, though the distinction between a compiler and an interpreter is much harder to make than I initially realized. I will spend some time researching just about everything mentioned on this page. However I am still unsure of how to edit the title of the question, or how to properly ask what I wanted in the first though. Even though it has been properly answered. — Wickie Lee, Sep 16 '14 at 23:20
All in all -- I think this would probably be a better fit for http://programmers.stackexchange.com/ than for StackOverflow, since it isn't a question about _writing code_ as such. — Charles Duffy, Sep 17 '14 at 00:05

Dan Lenski · Accepted Answer · 2020-05-07T19:32:31.620

5

You seem to be confused about the distinction between compilers and interpreters, since you refer to both in your question without a clear distinction. (Quite understandble... see all the comments flying around this thread :-))

Compilers and interpreters are somewhat, though not totally, orthogonal concepts:

Compilers

Compilers take source code and produce a form that can be executed more efficiently, whether that be native machine code, or an intermediate form like CPython's bytecode.

c is perhaps the canonical example of a language that is almost always compiled to native machine code. The language was indeed designed to be relatively easy and efficient to translate into machine code. RISC CPU architectures became popular after the C language was already well-adopted for operating system programming, and they were often designed to make it even more efficient to translate certain features of C to machine code.

So the "compilability" of C has become self-reinforcing. It is difficult to introduce a new architecture on which it is hard to write a good C compiler (e.g. Itanium) that fully takes advantage of the hardware's potential. If your CPU can't run C code efficiently, it can't run most operating systems efficiently (the low-level bits of Linux, Unix, and Windows are mainly written in C).

Interpreters

Interpreters are traditionally defined as programs that try to run source code directly from its source representation. Most implementations of BASIC worked like this, back in the good ol' days: BASIC would literally re-parse each line of code on each iteration through a loop.

Modern languages

Modern programming languages and platforms blur the lines a lot. Languages like python, java, or c# are typically not compiled to native machine code, but to various intermediate forms like bytecode.

CPython's bytecode can be interpreted, but the overhead of interpretation is much lower because the code is fully parsed beforehand (and saved in .pyc file so it doesn't need to be re-parsed until it is modified).

Just-in-time compilation can be used to translate bytecode to native machine code just before it is actually run, with many different strategies for exactly when the native code compilation should take place.

Some languages that have "traditionally" been run via a bytecode interpreter or JIT compiler are also amenable to ahead-of-time compilation. For example, the Dalvik VM used in previous versions of Android relies on just-in-time compilation, while Android 4.4 has introduced ART which uses ahead-of-time compilation intsead.

Intermediate representations of Python

Here's a great thread containing a really useful and thoughful answer by @AlexMartelli on the lower-level compiled forms generated by various implementations of Python.

Answering the original question (I think...)

A traditional interpreter will almost certainly execute code slower than if that same code were compiled to "bare metal" machine code, all else being equal (which it typically is not), because the interpreter imposes an additional cost of parsing every line or unit of code every time it is executed.

So if a traditional interpreter were running under an interpreter, which was itself running under an interpreter, etc., ... that would result in a performance loss, just as running a VM (virtual machine) under a VM under a VM will be slower than running on "bare metal."

This is not so for a compiler. I could write a compiler which runs under an interpreter which runs under an interpreter which has been compiled by a compiler, etc... the resulting compiler could generate native machine code that is just as good as the native code generated by any other compiler. (It is important to realize that the performance of the compiler itself can be entirely independent of the performance of the executed code; an aggressive optimizing C compiler typically takes much more time to compile code than a non-optimizing compiler, but the intention is for the resultant native code to run significantly faster.)

edited May 07 '20 at 19:32

answered Sep 16 '14 at 22:32

Dan Lenski

76,929
13
76
124

So this explains why [PyPy](http://pypy.org/) is slower than CPython, I guess? – abarnert Sep 16 '14 at 22:35
1

@abarnert, PyPy isn't interpreted -- it's translated to C and compiled during its build process. Given that, I'm unclear on what your comment has to do with the answer to which it responds. – Charles Duffy Sep 16 '14 at 22:37
1

(@abarnert, ...and since sarcasm doesn't translate well through text -- PyPy is significantly faster on some benchmarks, significantly slower on others, so it's hard to accurately generalize without measuring for a specific real-world workflow). – Charles Duffy Sep 16 '14 at 22:38
@abarnert, I assume that was tongue-in-cheek? Just-in-time compilation is a whole other topic which doesn't easily fit into the traditional framework of "compiled vs. interpreted" in all cases. – Dan Lenski Sep 16 '14 at 22:39
@CharlesDuffy: PyPy is Python code that interprets Python code, and it's generally faster than CPython. Not _always_ faster, sure, but certainly not inevitably slower, which your answer implies it should be. A JIT, or other kinds of runtime optimization, can easily compensate for the cost of an extra layer. – abarnert Sep 16 '14 at 22:40
1

@abarnert, ...but PyPy is **not** an extra layer; in the circumstances in which these benchmarks are taken, it doesn't itself run on top of CPython, but is compiled to C at build time. – Charles Duffy Sep 16 '14 at 22:41
@DanLenski: The point is that the kind of JIT that PyPy does would be very hard to do at the bare metal layer. Not _impossible_, certainly, but considering that PyPy has a lot less code than Oracle's JVM and PyPy usually (again, not always) runs Python code faster than Jython, that points out that the "thickness" of the extra layer is barely relevant in optimized contexts. And, if you don't like that, Jython and IronPython are _also_ often significantly faster than CPython, if not as fast as PyPy, and you can't deny there's an interpreted VM there. – abarnert Sep 16 '14 at 22:42
@CharlesDuffy, precisely. PyPy is faster than CPython in many cases because it actually generates a more-efficiently-executable representation of the code it is processing, which is what I was trying to get at in the last paragraph. – Dan Lenski Sep 16 '14 at 22:43
@abarnert, ...sure, but we're still talking about a single layer -- the better one provided by PyPy vs the worse one provided by CPython -- as opposed to layering two layers of runtime interpretation atop each other. – Charles Duffy Sep 16 '14 at 22:43
1

@DanLenski, yes -- I'm in agreement with you here. Apologies if joining your discussion here as a third party has made things hard to follow. – Charles Duffy Sep 16 '14 at 22:44
Under the Java and .NET platforms, there is typically *not* an interpreted VM; most modern implementations involve just-in-time compilation of the VM's code to truly native code. – Dan Lenski Sep 16 '14 at 22:45
1

@DanLenski, ...well -- it wasn't that long ago that Sun's JVM would do interpretation for a while (for any given chunk of bytecode), until enough observations have been made to allow the JIT to generate efficient code. Although it might indeed be JIT from the beginning (with instrumented bytecode generated on the early passes) now. – Charles Duffy Sep 16 '14 at 22:45
1

@CharlesDuffy, I got that, thanks! It seems there's a lot of disagreement over what exactly is meant by "interpreted" code, which is understandable since many modern languages and platforms blur the line with non-native, intermediate binary forms... some of which are truly generated on-the-fly as you point out. – Dan Lenski Sep 16 '14 at 22:49
@CharlesDuffy: I think it does lookahead compilation of code paths now, but still falls back to interpreting on mispredictions. At any rate, if I wrote an interpreter that worked by compiling each instruction on the fly and then executing it, instead of using a big switch statement, would that really no longer count as an interpreter? Or, to avoid ambiguous terms, would that really no longer count as another layer that "gets in the way" as far as your answer is concerned? At the very least, the compilation is extra work that has to be done, that wouldn't exist with a bare-metal implementation. – abarnert Sep 16 '14 at 22:52
@abarnert, I think this is a terminology issue. Most modern programming languages don't fit clearly into the traditional dichotomy. I wrote my answer thinking about the "big switch statement" interpreters like BASIC. – Dan Lenski Sep 16 '14 at 22:56
1

@abarnert, correct -- I don't count runtime compilation as interpretation, just as Clojure is still a bytecode-compiled language even if you don't leverage AOT compilation and let it compile your source code to JVM bytecode when that bytecode is loaded. (Same goes, in the context of Clojure, for typing code in at the REPL interactively -- it's compiled to JVM bytecode before being executed, meaning no longer an interpreted language). – Charles Duffy Sep 16 '14 at 23:11
1

After reading through all of the comments, I am slightly more confused then before, but at least I now what all the different topics I need to research are to help get a more defined idea behind this. Thank you everyone for your input. – Wickie Lee Sep 16 '14 at 23:22
1

@DanLenski: Sure, that's a valid way to define things… but it's hard to think of any modern language system that qualifies as an interpreter under that terminology. (Even Python bytecode, even if that were something you wrote directly, isn't _quite_ interpreted with just a big switch statement.) Honestly, I don't think it's possible to give a good answer to this question without a much more detailed discussion of what "interpreter" means than is appropriate for StackOverflow. – abarnert Sep 16 '14 at 23:23
@DanLenski: Or, given that the OP is asking about PyPy, which is not an interpreter at all in your terminology, I suppose you could answer, "Your premises are wrong, take your question back and ask a different one," but that isn't a useful SO answer either. – abarnert Sep 16 '14 at 23:25
1

As you can see, I've revised the answer significantly to try to clarify both the terminology as well as some of the ways that modern inbetwirpated languages actually run their code. – Dan Lenski Sep 16 '14 at 23:38
1

@Dan Lenski: The added information in your answer has significantly helped clear things up for me. I agree that the terminology is what was slipping me up, due to misinformation floating around the internet. – Wickie Lee Sep 17 '14 at 15:53