Why does Python compile modules but not the script being run?

Question

Why does Python compile libraries that are used in a script, but not the script being called itself?

For instance,

If there is main.py and module.py, and Python is run by doing python main.py, there will be a compiled file module.pyc but not one for main. Why?

If the response is potential disk permissions for the directory of main.py, why does Python compile modules? They are just as likely (if not more likely) to appear in a location where the user does not have write access. Python could compile main if it is writable, or alternatively in another directory.
If the reason is that benefits will be minimal, consider the situation when the script will be used a large number of times (such as in a CGI application).

score 30 · Accepted Answer · edited Mar 10 '23 at 14:59

Files are compiled upon import. It isn't a security thing. It is simply that if you import it python saves the output. See this post by Fredrik Lundh on Effbot.

>>>import main
# main.pyc is created

When running a script python will not use the *.pyc file. If you have some other reason you want your script pre-compiled you can use the compileall module.

python -m compileall .

compileall Usage

python -m compileall --help
option --help not recognized
usage: python compileall.py [-l] [-f] [-q] [-d destdir] [-x regexp] [directory ...]
-l: don't recurse down
-f: force rebuild even if timestamps are up-to-date
-q: quiet operation
-d destdir: purported directory name for error messages
   if no directory arguments, -l sys.path is assumed
-x regexp: skip files matching the regular expression regexp
   the regexp is searched for in the full path of the file

If the response is potential disk permissions for the directory of main.py, why does Python compile modules?

Modules and scripts are treated the same. Importing is what triggers the output to be saved.

If the reason is that benefits will be minimal, consider the situation when the script will be used a large number of times (such as in a CGI application).

Using compileall does not solve this. Scripts executed by python will not use the *.pyc unless explicitly called. This has negative side effects, well stated by Glenn Maynard in his answer.

The example given of a CGI application should really be addressed by using a technique like FastCGI. If you want to eliminate the overhead of compiling your script you may want eliminate the overhead of starting up python too, not to mention database connection overhead.

A light bootstrap script can be used or even python -c "import script", but these have questionable style.

@MattJoiner: You _can_ explicitly import main like so: `"import __main__"` and yet a .pyc file is not made, so that assertion is not correct. Also, `__main__` **is** compiled. The result of the compilation is just not written out as a .pyc file. It is just held in ram. — Douglas, Jan 26 '12 at 17:28

score 30 · Answer 2 · answered Mar 16 '11 at 06:19

Nobody seems to want to say this, but I'm pretty sure the answer is simply: there's no solid reason for this behavior.

All of the reasons given so far are essentially incorrect:

There's nothing special about the main file. It's loaded as a module, and shows up in sys.modules like any other module. Running a main script is nothing more than importing it with a module name of __main__.
There's no problem with failing to save .pyc files due to read-only directories; Python simply ignores it and moves on.
The benefit of caching a script is the same as that of caching any module: not wasting time recompiling the script every time it's run. The docs acknowledge this explicitly ("Thus, the startup time of a script may be reduced ...").

Another issue to note: if you run python foo.py and foo.pyc exists, it will not be used. You have to explicitly say python foo.pyc. That's a very bad idea: it means Python won't automatically recompile the .pyc file when it's out of sync (due to the .py file changing), so changes to the .py file won't be used until you manually recompile it. It'll also fail outright with a RuntimeError if you upgrade Python and the .pyc file format is no longer compatible, which happens regularly. Normally, this is all handled transparently.

You shouldn't need to move a script to a dummy module and set up a bootstrapping script to trick Python into caching it. That's a hackish workaround.

The only possible (and very unconvincing) reason I can contrive is to avoid your home directory from being cluttered with a bunch of .pyc files. (This isn't a real reason; if that was an actual concern, then .pyc files should be saved as dotfiles.) It's certainly no reason not to even have an option to do this.

Python should definitely be able to cache the main module.

@Matt: To the question "why does it do this", there's no answer. It's a question with a flawed premise: that reasons exist for everything. Python is written by humans, so it has its glitches, inconsistencies and warts, like every other language, and I believe this is one of them. Thus my answer: there is no good, convincing reason for this. If one exists, it hasn't been hinted at here and it's nowhere to be found in the documentation. — Glenn Maynard, Mar 16 '11 at 20:56
I really wish people in general would actually answer questions, just as you have @GlennMaynard. So, good job! I also find it quite discouraging when people give mechanistic answers that seem as if they are answering the question while actually just avoiding addressing the intent behind the question. I see this bad kind of answer all the time, as well as people defending to the death incorrect behavior, just because it's the current status quo. — Douglas, Jan 26 '12 at 17:33
With the introduction of `__pycache__` in [Python3.2](https://docs.python.org/3/whatsnew/3.2.html#pep-3147-pyc-repository-directories), even the rationale of avoiding clutter is no longer valid. — Yoel, Aug 06 '21 at 10:08
Script files are commonly found in `/usr/bin` or `/usr/local/bin`, etc, where a `__pycache__` directory or `.pyc` file would be entirely out of place. So only library files (modules, packages, etc.) that normally are placed in a dedicated directory on `sys.path`, get a cache file. — Martijn Pieters, Sep 19 '21 at 02:27

Mark E. Haase · Answer 3 · 2014-12-11T17:44:00.267

Pedagogy

I love and hate questions like this on SO, because there's a complex mixture of emotion, opinion, and educated guessing going on and people start to get snippy, and somehow everybody loses track of the actual facts and eventually loses track of the original question altogether.

Many technical questions on SO have at least one definitive answer (e.g. an answer that can be verified by execution or an answer that cites an authoritative source) but these "why" questions often do not have just a single, definitive answer. In my mind, there are 2 possible ways to definitively answer a "why" question in computer science:

By pointing to the source code that implements the item of concern. This explains "why" in a technical sense: what preconditions are necessary to evoke this behavior?
By pointing to human-readable artifacts (comments, commit messages, email lists, etc.) written by the developers involved in making that decision. This is the real sense of "why" that I assume the OP is interested in: why did Python's developers make this seemingly arbitrary decision?

The second type of answer is more difficult to corroborate, since it requires getting in the mind of the developers who wrote the code, especially if there's no easy-to-find, public documentation explaining a particular decision.

To date, this thread has 7 answers that solely focus on reading the intent of Python's developers and yet there is only one citation in the whole batch. (And it cites a section of the Python manual that does not answer the OP's question.)

Here's my attempt at answering both of the sides of the "why" question along with citations.

Source Code

What are the preconditions that trigger compilation of a .pyc? Let's look at the source code. (Annoyingly, the Python on GitHub doesn't have any release tags, so I'll just tell you that I'm looking at 715a6e.)

There is promising code in import.c:989 in the load_source_module() function. I've cut out some bits here for brevity.

static PyObject *
load_source_module(char *name, char *pathname, FILE *fp)
{
    // snip...

    if (/* Can we read a .pyc file? */) {
        /* Then use the .pyc file. */
    }
    else {
        co = parse_source_module(pathname, fp);
        if (co == NULL)
            return NULL;
        if (Py_VerboseFlag)
            PySys_WriteStderr("import %s # from %s\n",
                name, pathname);
        if (cpathname) {
            PyObject *ro = PySys_GetObject("dont_write_bytecode");
            if (ro == NULL || !PyObject_IsTrue(ro))
                write_compiled_module(co, cpathname, &st);
        }
    }
    m = PyImport_ExecCodeModuleEx(name, (PyObject *)co, pathname);
    Py_DECREF(co);

    return m;
}

pathname is the path to the module and cpathname is the same path but with a .pyc extension. The only direct logic is the boolean sys.dont_write_bytecode. The rest of the logic is just error handling. So the answer we seek isn't here, but we can at least see that any code that calls this will result in a .pyc file under most default configurations. The parse_source_module() function has no real relevance to the flow of execution, but I'll show it here because I'll come back to it later.

static PyCodeObject *
parse_source_module(const char *pathname, FILE *fp)
{
    PyCodeObject *co = NULL;
    mod_ty mod;
    PyCompilerFlags flags;
    PyArena *arena = PyArena_New();
    if (arena == NULL)
        return NULL;

    flags.cf_flags = 0;

    mod = PyParser_ASTFromFile(fp, pathname, Py_file_input, 0, 0, &flags, 
                   NULL, arena);
    if (mod) {
        co = PyAST_Compile(mod, pathname, NULL, arena);
    }
    PyArena_Free(arena);
    return co;
}

The salient aspect here is that the function parses and compiles a file and returns a pointer to the byte code (if successful).

Now we're still at a dead end, so let's approach this from a new angle. How does Python load it's argument and execute it? In pythonrun.c there are a few functions for loading code from a file and executing it. PyRun_AnyFileExFlags() can handle both interactive and non-interactive file descriptors. For interactive file descriptors, it delegates to PyRun_InteractiveLoopFlags() (this is the REPL) and for non-interactive file descriptors, it delegates to PyRun_SimpleFileExFlags(). PyRun_SimpleFileExFlags() checks if the filename ends in .pyc. If it does, then it calls run_pyc_file() which directly loads compiled byte code from a file descriptor and then runs it.

In the more common case (i.e. .py file as an argument), PyRun_SimpleFileExFlags() calls PyRun_FileExFlags(). This is where we start to find our answer.

PyObject *
PyRun_FileExFlags(FILE *fp, const char *filename, int start, PyObject *globals,
          PyObject *locals, int closeit, PyCompilerFlags *flags)
{
    PyObject *ret;
    mod_ty mod;
    PyArena *arena = PyArena_New();
    if (arena == NULL)
        return NULL;

    mod = PyParser_ASTFromFile(fp, filename, start, 0, 0,
                   flags, NULL, arena);
    if (closeit)
        fclose(fp);
    if (mod == NULL) {
        PyArena_Free(arena);
        return NULL;
    }
    ret = run_mod(mod, filename, globals, locals, flags, arena);
    PyArena_Free(arena);
    return ret;
}

static PyObject *
run_mod(mod_ty mod, const char *filename, PyObject *globals, PyObject *locals,
     PyCompilerFlags *flags, PyArena *arena)
{
    PyCodeObject *co;
    PyObject *v;
    co = PyAST_Compile(mod, filename, flags, arena);
    if (co == NULL)
        return NULL;
    v = PyEval_EvalCode(co, globals, locals);
    Py_DECREF(co);
    return v;
}

The salient point here is that these two functions basically perform the same purpose as the importer's load_source_module() and parse_source_module(). It calls the parser to create an AST from Python source code and then calls the compiler to create byte code.

So are these blocks of code redundant or do they serve different purposes? The difference is that one block loads a module from a file, while the other block takes a module as an argument. That module argument is — in this case — the __main__ module, which is created earlier in the initialization process using a low-level C function. The __main__ module doesn't go through most of the normal module import code paths because it is so unique, and as a side effect, it doesn't go through the code that produces .pyc files.

To summarize: the reason why the __main__ module isn't compiled to .pyc is that it isn't "imported". Yes, it appears in sys.modules, but it gets there via a very different code path than real module imports take.

Developer Intent

Okay, so we can now see that the behavior has more to do with the design of Python than with any clearly expressed rationale in the source code, but that doesn't answer the question of whether this is an intentional decision or just a side effect that doesn't bother anybody enough to be worth changing. One of the benefits of open source is that once we've found the source code that interests us, we can use the VCS to help trace back to the decisions that led to the present implementation.

One of the pivotal lines of code here (m = PyImport_AddModule("__main__");) dates back to 1990 and was written by the BDFL himself, Guido. It has been modified in intervening years, but the modifications are superficial. When it was first written, the main module for a script argument was initialized like this:

int
run_script(fp, filename)
    FILE *fp;
    char *filename;
{
    object *m, *d, *v;
    m = add_module("`__main__`");
    if (m == NULL)
        return -1;
    d = getmoduledict(m);
    v = run_file(fp, filename, file_input, d, d);
    flushline();
    if (v == NULL) {
        print_error();
        return -1;
    }
    DECREF(v);
    return 0;
}

This existed before .pyc files were even introduced into Python! Small wonder that the design at that time didn't take compilation into account for script arguments. The commit message enigmatically says:

"Compiling" version

This was one of several dozen commits over a 3 day period... it appears that Guido was deep into some hacking/refactoring and this was the first version that got back to being stable. This commit even predates the creation of the Python-Dev mailing list by about five years!

Saving the compiled bytecode was introduced 6 months later, in 1991.

This still predates the list serve, so we have no real idea of what Guido was thinking. It appears that he simply thought that the importer was the best place to hook into for the purpose of caching bytecodes. Whether he considered the idea of doing the same for __main__ is unclear: either it didn't occur to him, or else he thought that it was more trouble than it was worth.

I can't find any bugs on bugs.python.org that are related to caching the bytecodes for the main module, nor can I find any messages on the mailing list about it, so apparently nobody else thinks it's worth the trouble to try adding it.

To summarize: the reason why all modules are compiled to .pyc except __main__ is that it's a quirk of history. The design and implementation for how __main__ works was baked into the code before .pyc files even existed. If you want to know more than that, you'll need to e-mail Guido and ask.

Glenn Maynard's answer says:

Nobody seems to want to say this, but I'm pretty sure the answer is simply: there's no solid reason for this behavior.

I agree 100%. There's circumstantial evidence to support this theory and nobody else in this thread has provided a single shred of evidence to support any other theory. I upvoted Glenn's answer.

Kabie · Answer 4 · 2011-03-16T21:51:32.633

4

Since:

A program doesn’t run any faster when it is read from a .pyc or .pyo file than when it is read from a .py file; the only thing that’s faster about .pyc or .pyo files is the speed with which they are loaded.

That is unnecessary to generate .pyc file for main script. Only the libraries which might be loaded many times should be compiled.

Edited:

It seem you didn't get my point. First, knowing the whole idea of compiling into .pyc file is to make the same file executing faster at the second time. However, consider if Python did compile the script being run. The interpreter will write bytecode into a .pyc file at the first running, this takes time. So it will even run a bit slower. You might argue that it will run faster after. Well, it just a choice. Plus, as this says:

Explicit is better than implicit.

If one wants a speedup by using .pyc file, one should compile it manually and run the .pyc file explicitly.

edited Mar 16 '11 at 21:51

answered Mar 11 '11 at 01:44

Kabie

10,489
1
38
45

1

That doesn't answer the question. – Matt Joiner Mar 11 '11 at 17:07
This seems the most plausible to me, I'd be curious to hear more about it though. – chmullig Mar 14 '11 at 03:57
4

No module is loaded multiple times during a single execution; they're loaded once and stored in sys.modules. .pyc files are used to speed up the initial import--which happens at most once per execution, both for the main script and for other modules. – Glenn Maynard Mar 16 '11 at 05:36

score 4 · Answer 5 · answered Mar 14 '11 at 05:11

To answer your question, reference to 6.1.3. “Compiled” Python files in Python official document.

When a script is run by giving its name on the command line, the bytecode for the script is never written to a .pyc or .pyo file. Thus, the startup time of a script may be reduced by moving most of its code to a module and having a small bootstrap script that imports that module. It is also possible to name a .pyc or .pyo file directly on the command line.

This only gives a workaround, not the reason for this behavior. — Glenn Maynard, Mar 16 '11 at 05:35

score 1 · Answer 6 · answered Mar 11 '11 at 01:28

1

Because the script being run may be somewhere where it is inappropriate to generate .pyc files, such as /usr/bin.

answered Mar 11 '11 at 01:28

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

1

But surely it wouldn't absolutely _need_ to store it in the same directory as the original script? – user470379 Mar 11 '11 at 01:58
1

@user470379: `>>> import this` ... `Special cases aren't special enough to break the rules.` – Ignacio Vazquez-Abrams Mar 11 '11 at 02:01
4

This doesn't answer the question. All the dependencies might also be on unwritable mounts too. It explicitly states in the module page that it is not an error to be unable to write the pyo file. Furthermore treating main differently is special casing it based on it's potential location. – Matt Joiner Mar 11 '11 at 17:10
2

@Matt is correct: writing .pyc files is always optional; you can always import a module even if you can't write a .pyc. – Glenn Maynard Mar 16 '11 at 05:20

score 0 · Answer 7 · answered Aug 04 '21 at 11:13

0

Because different versions of Python (3.6, 3.7 ...) have different bytecode representations, and trying to design a compile system for that was deemed too complicated. PEP 3147 discusses the rationale.

answered Aug 04 '21 at 11:13

gerardw

5,822
46
39

Why does Python compile modules but not the script being run?

7 Answers7

compileall Usage

Pedagogy

Source Code

Developer Intent

Linked

Related