How does the python interpreter know when to compile and update a .pyc file?

Question

I knew that a .pyc file is generated by the python interpreter and contains the byte code as this question said.

I thought python interpreter is using the time stamp to detect whether a .pyc is newer than a .py, and if it is, skipped compiling it again when executing. (The way what makefile do)

So, I did a test, but it seemed I was wrong.

I wrote t.py contains print '123' and t1.py contains import t. Running command python t1.py gave the output 123 and generated t.pyc, all as expected.
Then I edited t.py as print '1234' and updated the time stamp of t.pyc by using touch t.pyc.
Run python t1.py again, I thought I would get 123 but 1234 indeed. So it seemed the python interpreter still knew that t.py is updated.

Then I wondered whether python interpreter will compile and generate t.pyc every time running python t1.py. But when I run python t1.py several times, I found that the t.pyc will not be updated when t.py is not updated.

So, my question is: how python interpreter knows when to compile and update a .pyc file?

Updated

Since python interpreter is using the timestamp stored in the .pyc file. I think it a record of when .pyc was last updated. And when imported, compare it with the timestamp of .py file.

So I tried to hack it in this way: change the OS time to an older one, and edit .py file. I thought when imported again, the .py seems older than the .pyc, and the python interpreter will not update .pyc. But I was wrong again.

So, does the python interpreter compare these two timestamp not in a older or newer way but in a exactly equal way?

In a exectly equal way, I means the timestamp in .pyc records the when the .py was last modified. When imported, it compares the timestamp with the current timestamp of .py, if it's not the same, recompile and update .pyc.

@JonathonReinhart: I'd say actual duplicate, especially since that question contains the answer: The timestamp consulted is not the file system's timestamp, but an internal timestamp stored in the `.pyc` file. — Tim Pietzcker, May 21 '14 at 06:46
@TimPietzcker Agreed, that was just the auto-generated comment. If only I had one of *them there fancy gold badges* I would have marked it duplicate immediately :-) — Jonathon Reinhart, May 21 '14 at 06:48
Still, +1 to the question, because of the creative effort that went into it - I had never thought about what happens if you modify the file's timestamp. — Tim Pietzcker, May 21 '14 at 06:48
@JonathonReinhart Did a bit more tests and updated the question. I think it's not a duplicated question any more. — WKPlus, May 21 '14 at 07:04
@TimPietzcker Did a bit more tests and updated the question.I think it's not a duplicated question any more. — WKPlus, May 21 '14 at 07:06
I think at this point, you'd be better off digging into the [Python source code](https://www.python.org/downloads/source/). — Jonathon Reinhart, May 21 '14 at 07:08
Reopening...good research. What time did you set your OS time to? Was it before the creation time of the `.pyc` file when you ran the import? — Tim Pietzcker, May 21 '14 at 07:09
In that case, I'd guess that if Python sees that a `.pyc` file has "come from the future", it sees that something is wrong, takes no chances and recompiles. But I haven't read the source - just a guess. — Tim Pietzcker, May 21 '14 at 07:16
@TimPietzcker It seemed not, I changed the OS time back to normal before executing it again. — WKPlus, May 21 '14 at 07:18

score 8 · Accepted Answer · answered May 21 '14 at 08:59

It looks like the timestamp is stored directly in the *.pyc file. The python interpreter doesn't rely on the last modification attribute of the file, maybe to avoid incompatibe bytecode issues when copying source trees.

Looking at the python implementation of the import statement, you can find the stale check in _validate_bytecode_header(). By the looks of it, it extracts bytes 4 to 7 (incl) and compares it against the timecode of the source file. If those doesn't match, the bytecode is considered stalled and thus recompiled.

In the process, it also checks the length of the source file against the length of the source used to generate a given bytecode (stored in bytes 8 to 11).

In the python implementation, if one of those checks fails, the bytecode loader raises an ImportError catched by SourceLoader.get_code() that triggers a recompilation of the bytecode.

Note: That's how it's done in the python version of importlib. I guess there's no functionnal difference in the native version, but my C is a bit too rusty to dig into compiler code

Thanks for the code link, it do help. `_r_long(raw_timestamp) != source_mtime` shows the timestamp comparison is in a equal way as I guessed:) — WKPlus, May 21 '14 at 11:01
Could have been more precise than “if those doesn't match”, that's for sure ;-) — svvac, May 21 '14 at 11:03

score 0 · Answer 2 · answered May 21 '14 at 07:13

0

As you think, it is effectively based on the timestamp of the last .py update. If the .py has been updated after the generation of the .pyc, the bytecode will be regenerate. It is the same behaviour than make (recompile only fresh files).

The .pyc is updated if you are importing the module, so your test has not worked because you have executed the code, not import it I believe.

answered May 21 '14 at 07:13

Maxime Lorant

34,607
19
87
97

But how to explain my final test's result? – WKPlus May 21 '14 at 07:15
I was importing it not executing it :) – WKPlus May 21 '14 at 07:17
He does know the difference between running and importing, and his test setup shows that. – Tim Pietzcker May 21 '14 at 07:17
Indeed, need to dig into the source code so as someone suggest in comments... Don't see anything relevant on Google or in my mind right now – Maxime Lorant May 21 '14 at 07:20

How does the python interpreter know when to compile and update a .pyc file?

2 Answers2