Pedagogy
I love and hate questions like this on SO, because there's a complex mixture of emotion, opinion, and educated guessing going on and people start to get snippy, and somehow everybody loses track of the actual facts and eventually loses track of the original question altogether.
Many technical questions on SO have at least one definitive answer (e.g. an answer that can be verified by execution or an answer that cites an authoritative source) but these "why" questions often do not have just a single, definitive answer. In my mind, there are 2 possible ways to definitively answer a "why" question in computer science:
- By pointing to the source code that implements the item of concern. This explains "why" in a technical sense: what preconditions are necessary to evoke this behavior?
- By pointing to human-readable artifacts (comments, commit messages, email lists, etc.) written by the developers involved in making that decision. This is the real sense of "why" that I assume the OP is interested in: why did Python's developers make this seemingly arbitrary decision?
The second type of answer is more difficult to corroborate, since it requires getting in the mind of the developers who wrote the code, especially if there's no easy-to-find, public documentation explaining a particular decision.
To date, this thread has 7 answers that solely focus on reading the intent of Python's developers and yet there is only one citation in the whole batch. (And it cites a section of the Python manual that does not answer the OP's question.)
Here's my attempt at answering both of the sides of the "why" question along with citations.
Source Code
What are the preconditions that trigger compilation of a .pyc? Let's look at the source code. (Annoyingly, the Python on GitHub doesn't have any release tags, so I'll just tell you that I'm looking at 715a6e
.)
There is promising code in import.c:989
in the load_source_module()
function. I've cut out some bits here for brevity.
static PyObject *
load_source_module(char *name, char *pathname, FILE *fp)
{
// snip...
if (/* Can we read a .pyc file? */) {
/* Then use the .pyc file. */
}
else {
co = parse_source_module(pathname, fp);
if (co == NULL)
return NULL;
if (Py_VerboseFlag)
PySys_WriteStderr("import %s # from %s\n",
name, pathname);
if (cpathname) {
PyObject *ro = PySys_GetObject("dont_write_bytecode");
if (ro == NULL || !PyObject_IsTrue(ro))
write_compiled_module(co, cpathname, &st);
}
}
m = PyImport_ExecCodeModuleEx(name, (PyObject *)co, pathname);
Py_DECREF(co);
return m;
}
pathname
is the path to the module and cpathname
is the same path but with a .pyc extension. The only direct logic is the boolean sys.dont_write_bytecode
. The rest of the logic is just error handling. So the answer we seek isn't here, but we can at least see that any code that calls this will result in a .pyc file under most default configurations. The parse_source_module()
function has no real relevance to the flow of execution, but I'll show it here because I'll come back to it later.
static PyCodeObject *
parse_source_module(const char *pathname, FILE *fp)
{
PyCodeObject *co = NULL;
mod_ty mod;
PyCompilerFlags flags;
PyArena *arena = PyArena_New();
if (arena == NULL)
return NULL;
flags.cf_flags = 0;
mod = PyParser_ASTFromFile(fp, pathname, Py_file_input, 0, 0, &flags,
NULL, arena);
if (mod) {
co = PyAST_Compile(mod, pathname, NULL, arena);
}
PyArena_Free(arena);
return co;
}
The salient aspect here is that the function parses and compiles a file and returns a pointer to the byte code (if successful).
Now we're still at a dead end, so let's approach this from a new angle. How does Python load it's argument and execute it? In pythonrun.c
there are a few functions for loading code from a file and executing it. PyRun_AnyFileExFlags()
can handle both interactive and non-interactive file descriptors. For interactive file descriptors, it delegates to PyRun_InteractiveLoopFlags()
(this is the REPL) and for non-interactive file descriptors, it delegates to PyRun_SimpleFileExFlags()
. PyRun_SimpleFileExFlags()
checks if the filename ends in .pyc
. If it does, then it calls run_pyc_file()
which directly loads compiled byte code from a file descriptor and then runs it.
In the more common case (i.e. .py
file as an argument), PyRun_SimpleFileExFlags()
calls PyRun_FileExFlags()
. This is where we start to find our answer.
PyObject *
PyRun_FileExFlags(FILE *fp, const char *filename, int start, PyObject *globals,
PyObject *locals, int closeit, PyCompilerFlags *flags)
{
PyObject *ret;
mod_ty mod;
PyArena *arena = PyArena_New();
if (arena == NULL)
return NULL;
mod = PyParser_ASTFromFile(fp, filename, start, 0, 0,
flags, NULL, arena);
if (closeit)
fclose(fp);
if (mod == NULL) {
PyArena_Free(arena);
return NULL;
}
ret = run_mod(mod, filename, globals, locals, flags, arena);
PyArena_Free(arena);
return ret;
}
static PyObject *
run_mod(mod_ty mod, const char *filename, PyObject *globals, PyObject *locals,
PyCompilerFlags *flags, PyArena *arena)
{
PyCodeObject *co;
PyObject *v;
co = PyAST_Compile(mod, filename, flags, arena);
if (co == NULL)
return NULL;
v = PyEval_EvalCode(co, globals, locals);
Py_DECREF(co);
return v;
}
The salient point here is that these two functions basically perform the same purpose as the importer's load_source_module()
and parse_source_module()
. It calls the parser to create an AST from Python source code and then calls the compiler to create byte code.
So are these blocks of code redundant or do they serve different purposes? The difference is that one block loads a module from a file, while the other block takes a module as an argument. That module argument is — in this case — the __main__
module, which is created earlier in the initialization process using a low-level C function. The __main__
module doesn't go through most of the normal module import code paths because it is so unique, and as a side effect, it doesn't go through the code that produces .pyc
files.
To summarize: the reason why the __main__
module isn't compiled to .pyc is that it isn't "imported". Yes, it appears in sys.modules, but it gets there via a very different code path than real module imports take.
Developer Intent
Okay, so we can now see that the behavior has more to do with the design of Python than with any clearly expressed rationale in the source code, but that doesn't answer the question of whether this is an intentional decision or just a side effect that doesn't bother anybody enough to be worth changing. One of the benefits of open source is that once we've found the source code that interests us, we can use the VCS to help trace back to the decisions that led to the present implementation.
One of the pivotal lines of code here (m = PyImport_AddModule("__main__");
) dates back to 1990 and was written by the BDFL himself, Guido. It has been modified in intervening years, but the modifications are superficial. When it was first written, the main module for a script argument was initialized like this:
int
run_script(fp, filename)
FILE *fp;
char *filename;
{
object *m, *d, *v;
m = add_module("`__main__`");
if (m == NULL)
return -1;
d = getmoduledict(m);
v = run_file(fp, filename, file_input, d, d);
flushline();
if (v == NULL) {
print_error();
return -1;
}
DECREF(v);
return 0;
}
This existed before .pyc
files were even introduced into Python! Small wonder that the design at that time didn't take compilation into account for script arguments. The commit message enigmatically says:
"Compiling" version
This was one of several dozen commits over a 3 day period... it appears that Guido was deep into some hacking/refactoring and this was the first version that got back to being stable. This commit even predates the creation of the Python-Dev mailing list by about five years!
Saving the compiled bytecode was introduced 6 months later, in 1991.
This still predates the list serve, so we have no real idea of what Guido was thinking. It appears that he simply thought that the importer was the best place to hook into for the purpose of caching bytecodes. Whether he considered the idea of doing the same for __main__
is unclear: either it didn't occur to him, or else he thought that it was more trouble than it was worth.
I can't find any bugs on bugs.python.org that are related to caching the bytecodes for the main module, nor can I find any messages on the mailing list about it, so apparently nobody else thinks it's worth the trouble to try adding it.
To summarize: the reason why all modules are compiled to .pyc
except __main__
is that it's a quirk of history. The design and implementation for how __main__
works was baked into the code before .pyc
files even existed. If you want to know more than that, you'll need to e-mail Guido and ask.
Glenn Maynard's answer says:
Nobody seems to want to say this, but I'm pretty sure the answer is simply: there's no solid reason for this behavior.
I agree 100%. There's circumstantial evidence to support this theory and nobody else in this thread has provided a single shred of evidence to support any other theory. I upvoted Glenn's answer.