-2

It is possible to decompile .pyc files: Decompile Python 2.7 .pyc

Is it possible to `compile` python files so there is a human-unreadable code, like the c++ -> exe binary file? ..unlike the plaintext .py and very easily recoverable .pyc files? (I don't mind if it can be cracked by brute force)

Community
  • 1
  • 1
Qwerty
  • 29,062
  • 22
  • 108
  • 136
  • 4
    The word you are looking for is "obsfuscate". I don't know anything about successful Python code obfuscation, but you can try encrypting python source code and decrypting it on-the-fly during execution. – Rafał Rawicki Feb 26 '13 at 11:00
  • But then you need to have the encryption key stored somewhere, so the *attacker* can use it to decrypt the code. Possibly even simpler would be for him to use a debugger do retrieve the decrypted version from RAM. I think distributing the byte-code would be more secure than a flawed encryption system... – Michael Wild Feb 26 '13 at 11:05
  • I dont't understand why was my question downvoted. – Qwerty Feb 26 '13 at 13:43
  • I only don't want my clients to open, read/change and CTRL+S the code. It is possible to ship only .pyc files, but they are very easilly decompiled to .py equivalent even with comments. – Qwerty Feb 26 '13 at 13:56

2 Answers2

8

Python is a highly dynamic language, and supports many different levels of introspection. Because of that, obfuscating Python bytecode is a mountainous task.

Moreover, your embedded python interpreter will still need to be able to execute the bytecode you ship with your product. And if the interpreter needs to be able to access the bytecode, then everyone else can too. Encryption won't help, because you still need to decrypt the bytecode yourself and then everyone else can read the bytecode from memory. Obfuscation only makes default tools harder, not impossible to use.

With that said, here is what you'd have to do to make it really bloody hard to read your application's Python bytecode:

  • Re-assign all python opcode values a new value. Rewire the whole interpreter to use different byte values for different opcodes.

  • Remove all as many introspection features as you can get away with. Your functions need to have closures, and codeobjects need constants still, but to hell with the locals list in the code object, for example. Neuter the sys._getframe() function, slash traceback information.

Both these steps require in-depth knowledge of how the Python interpreter works, and how the Python object model fits together. You will most certainly introduce bugs that will be hard to solve.

In the end, you have to ask yourself if this is worth it. A determined hacker can still analyze your bytecode, do a some frequency analysis to reconstruct the opcode table, and / or feed your program different opcodes to see what happens, and decipher all the obfuscation. Once a translation table is created, decompiling your bytecode is a snap, and reconstructing your code is not far away.

If all you want to do is prevent bytecode files from being altered, embed checksums for your .pyc files, and check those on startup. Refuse to load if they don't match. Someone will patch your binary to remove the checksum check or replace the checksums, but you won't have to put in nearly as much effort to provide at least some token protection from tampering.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 3
    You're the kind of guy that sneaks `#define`s into people's C/C++ code aren't you? :) – Jon Clements Feb 26 '13 at 11:30
  • Not sure whether I should upvote. Answers shows deep understanding of the topic at hand, but might give OP *too much* information about an altogether rather inadvisable course of action. – Junuxx Feb 26 '13 at 11:31
  • @JonClements: I actually have *seen* an application do this. Rewire the whole opcode table, that is. I was sorely tempted to re-construct the table, it's just a classic substitution cypher so frequency analysis should help, *and* you can feed the system your own constructed bytecode to crack it. – Martijn Pieters Feb 26 '13 at 11:32
  • 2
    @Junuxx: I didn't give any details on how to do it, did I? I wanted to illustrate to what lengths you'd had to go to and that those lengths would still be futile. – Martijn Pieters Feb 26 '13 at 11:33
  • @Martijn: True, with the possible result that he'll attempt it anyway and ask a bunch of questions about how to remove Python's introspection features etc :P – Junuxx Feb 26 '13 at 11:34
  • 4
    @Junuxx: And we'll remind the OP at every step it is pointless to do so. – Martijn Pieters Feb 26 '13 at 11:35
  • This is rather overkilled solution. I only don't want my clients to open, read/change and CTRL+S the code. It is possible to ship only .pyc files, but they are very easilly decompiled to .py equivalent even with comments. – Qwerty Feb 26 '13 at 13:54
  • 2
    @Qwerty: You *cannot* avoid shipping with bytecode. The only options you have are to ship with the bytecode as is, or to obfuscate the bytecode. I merely tried to show you that that path is not going to be practical, nor foolproof. – Martijn Pieters Feb 26 '13 at 13:56
  • 1
    @Qwerty: The last option you have is to *not use python*. Your machine code (produced from your C++ source) can be decompiled, altered and saved too though, so your mileage may vary. – Martijn Pieters Feb 26 '13 at 13:58
  • So.. Is it possible to `compile` python files so there is a human-unreadable code, unlike the plaintext .py and .pyc files? (I don't mind if it can be cracked by brute force) – Qwerty Mar 06 '13 at 10:52
  • @Qwerty: .pyc files are not plaintext, they are not human readable. Disassembly is basically 'brute force' cracking. Otherwise, encrypt files using a key (which can be lifted from your binary again), or go the whole hog as I described in my answer. – Martijn Pieters Mar 06 '13 at 10:53
  • @MartijnPieters You're right. I probably don't even know what I want. I somewhat learned that .pyc files are **very easily** recovered back into .py files, even with comments. The only thing I wanted was some kind of _binary file_, full of `1,0` and stuff similarly to `c++ -> exe`. But I will stick to the .pyc after all. I heard a rumour about python and zip archives. Could it help.. maybe? – Qwerty Mar 06 '13 at 11:12
  • 1
    @Qwerty: You can import from zip files, yes. See http://docs.python.org/2/library/zipimport.html – Martijn Pieters Mar 06 '13 at 11:13
3

Every system with code encrypting can be attacked as the decrypting key must be somewhere present.

As such, it is just a question of effort.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • 2
    +1. You can make it hard for the attacker, but you can't make it impossible. – dmg Feb 26 '13 at 11:19
  • I only don't want my clients to open, read/change and CTRL+S the code. It is possible to ship only .pyc files, but they are very easilly decompiled to .py equivalent even with comments. – Qwerty Feb 26 '13 at 13:55
  • Given _"As such, it is just a question of effort."_ can the effort be increased if you wrap the code in java and ship a .jar file? I'm curious? – sAguinaga Sep 23 '19 at 16:54
  • @sAguinaga Yes, the effort needed can be marginally increased by that, but I am not sure it is worth the added work. – glglgl Sep 23 '19 at 19:55