5

I want to understand how Python works at a base level, and this will hopefully help me understand a bit more about the inner workings of other compiled/interpreted languages. Unfortunately, the compilers class is a bit away for now. From what I read on this site and elsewhere, people answering "What base language is Python written in" seem to convey that there's a difference between talking about the "rules" of a language versus how the language rules are implemented for usage. So, is it correct to say that Python (and other high-level languages) are all essentially just sets of rules "written" in any natural language? And then the matter of how they're actually used (where used means compiled/interpreted to actually create things) can vary, with various languages being used to implement compilers? So in this case, CPython, IronPython, and Jython would be syntactically equal languages which all follow the same set of rules, just that those rules are implemented themselves in their respective languages.

Please let me know if my understanding of this is correct, if you have anything to add that might further solidify my understanding, or if I'm blatantly wrong.

Michael W
  • 343
  • 2
  • 11
  • You could easily implement python compiler in python itself (non-trivially, i. e. without using eval). Most C/C++ compilers are actually implemented in C/C++. For more "duuuude" stuff, see https://www.schneier.com/blog/archives/2006/01/countering_trus.html – necromancer Jan 17 '18 at 04:22

3 Answers3

7

Code written in Python should be able to run on any Python interpreter. Python is essentially a specification for a programming language with a reference implementation (CPython). Whenever the Python specifications and PEPs are ambiguous, the other interpreters usually choose to implement the same behavior, unless they have reason not to.

That being said, it's entirely possible that a program written in Python will behave differently on different implementations. This is because many programmers venture into "undefined behavior." For example, CPython has a "Global Interpreter Lock" that means only one thread is actually executing at a time (modulo some conditions), but other interpreters do not have that behavior. So, for example, there is different behaviors about atomicity (e.g., each bytecode instruction is atomic in CPython) as other interpreters.

You can consider it like C. C is a language specification, but there are many compilers implementing it: GCC, LLVM, Borland, MSVC++, ICC, etc. There are programming languages and implementations of those programming languages.

David
  • 618
  • 5
  • 9
  • This was extremely helpful. I didn't even know PEP existed before. Now I understand as well why professors have students upload code to their server to ensure it's running on the same specification. Had you not explained this, I would have gone forward assuming Python would be interpreted the same way by every compiler. Thanks so much for this informative response, definitely something to watch out for in the future. – Michael W Jan 17 '18 at 04:12
4

You are correct when you make the distinction between what a language means and how it does what it means.

What it means

The first step to compiling a language is to parse its code to generate an Abstract Syntax Tree. That is a tree that defines what the code you wrote means, what it is supposed to do. By example if you have the following code

a = 1
if a:
    print('not zero')

It would generate a tree that looks more or less like this.

             code
   ___________|______
   |                 |
declaration          if
 __|__             ___|____
 |    |            |       |
 a    1            a     print
                           |
                       'not zero'

This represents what the code means, but tells us nothing about how it executes it.

Edit: of course the above is far from what Python's parsers would actually generate, I made plenty of oversimplification for the purpose of readability. Luckily for us, if you are curious about what is actually generated you can import ast that provides a Python parser.

import ast
code = """
a = 1
if a:
    print('not zero')
"""
my_ast = ast.parse(code)

Enjoy inspecting my_ast.

What it does

Once you have an AST, you can convert it back to whatver you want. It can be C, it can be machine code, you can even convert it back to Python if you wish. The most used implementation of Python is CPython which is written in C.

What is going on under the hood is thus pretty close to your understanding. First, a language is a set of rules that defines a behaviour, and only then is there an implementation to that languages that defines how it does it. And yes of course, you can have different implementations of a same language with slight difference of behaviours.

Community
  • 1
  • 1
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
  • This was so helpful. Thanks for spending the time on making that tree. One thing: why is the check for a on the same level as print there, wouldn't it sequentially execute print after it checks for a? Why is print not below a on the tree? – Michael W Jan 17 '18 at 04:25
  • This AST is far from what the actual AST generated for Python code would look like. Its only purpose is to show *kind of* what it looks like. If you are insterested in looking at what it resembles, you can do `import ast` and use `ast.parse`. Let me add an example of that in ym answer. – Olivier Melançon Jan 17 '18 at 04:30
  • thank you again. Very helpful, and helps me visualize code execution in the future. +1 – Michael W Jan 17 '18 at 05:47
-2

Basically it's a bunch of dictionary data structures implementing functions, modules, etc. The global variables and their values live in a per-module dictionary. Variables within a class are another dictionary. Those within an object are yet another dictionary and so are those within a function. Even a function call has its own dictionary so that different calls have different copies of the local variables.

It has no lexical scope unlike most other languages and, in my opinion, was designed to be implemented as simply as possible by 1 coder using dictionaries.

necromancer
  • 23,916
  • 22
  • 68
  • 115
  • 1
    Do you have any citations for that last claim? That seems to be a rather remarkable statement. – Bryan Oakley Jan 17 '18 at 04:06
  • @BryanOakley The lack of lexical scope might be explicitly documented in the language reference, but it is rather trivial to observe with some simple code (happy to construct an example). For the 1 coder bit, that's more of an observation and inference since languages are born and evolve rather chaotically. i. e. feel free to cite me :-) – necromancer Jan 17 '18 at 04:11
  • I don't understand why I should cite you. You made a claim about python that doesn't seem to be backed up by anything more than a personal observation. If that's the case, you might want to say so in your answer. – Bryan Oakley Jan 17 '18 at 04:12
  • 1
    I disagree wholeheartedly with this "observation and inference". Furthermore, I don't see how this answers OP's question. -1. – miradulo Jan 17 '18 at 04:12
  • @BryanOakley I believe the custom in writing is to cite other people's opinion, and the "default" is understood to be the author's own. However, since it bothers you I have added "in my opinion" – necromancer Jan 17 '18 at 04:15
  • @necromancer, I'm curious as to how these dictionaries are implemented; you're saying functions, variables, etc, are all implemented in the form `function: function information`? So when read by a compiler it's all one big embedded dictionary? – Michael W Jan 17 '18 at 05:46
  • @MichaelW, modules, functions, classes, objects, methods .. each of these has a "namespace" within which names have to be unique (and outside which you can reuse names for different things) .. each name can be a variable or a function, thus the dictionary is from string to data/code. However, this is a conceptual model. Dictionaries can be vastly optimized to the point they barely look like conventional dictionaries. In some cases pre-compilation can completely eliminate the "name" as languages such as C/C++ do and directly work memory locations. Difficult with python hence my characterization – necromancer Jan 17 '18 at 09:43