1

The reason I want to this is I want to use the tool pyobfuscate to obfuscate my python code. Butpyobfuscate can only obfuscate one file.

user299648
  • 2,769
  • 6
  • 34
  • 43
  • The big question here is: Why? `pyobfuscate` isn't a very good obfuscater. And obfuscation isn't a very useful thing to do with Python. You can already just ship the `.pyc` bytecode files. To read them usefully, someone has to decompile them into source—source which looks very much like the result of running an obfuscater on the original source. – abarnert Aug 13 '13 at 02:18

4 Answers4

1

There are a definitely ways to turn a tree of modules into a single module. But it's not going to be trivial. The simplest thing I can think of is this:

First, you need a list of modules. This is easy to gather with the find command or a simple Python script that does an os.walk.

Then you need to use grep or Python re to get all of the import statements in each file, and use that to topologically sort the modules. If you only do absolute flat import foo statements at the top level, this is a trivial regex. If you also do absolute package imports, or from foo import bar (or from foo import *), or import at other levels, it's not much trickier. Relative package imports are a bit harder, but not that big of a deal. Of course if you do any dynamic importing, use the imp module, install import hooks, etc., you're out of luck here, but hopefully you don't.

Next you need to replace the actual import statements. With the same assumptions as above, this can be done with a simple sed or re.sub, something like import\s+(\w+) with \1 = sys.modules['\1'].

Now, for the hard part: you need to transform each module into something that creates an equivalent module object dynamically. This is the hard part. I think what you want to do is to escape the entire module code so that it can put into a triple-quoted string, then do this:

import types
mod_globals = {}
exec('''
# escaped version of original module source goes here
''', mod_globals)
mod = types.ModuleType(module_name)
mod.__dict__.update(mod_globals)
sys.modules[module_name] = mod

Now just concatenate all of those transformed modules together. The result will be almost equivalent to your original code, except that it's doing the equivalent of import foo; del foo for all of your modules (in dependency order) right at the start, so the startup time could be a little slower.

abarnert
  • 354,177
  • 51
  • 601
  • 671
1

I've answered your direct question separately, but let me offer a different solution to what I suspect you're actually trying to do:

Instead of shipping obfuscated source, just ship bytecode files. These are the .pyc files that get created, cached, and used automatically, but you can also create them manually by just using the compileall module in the standard library.

A .pyc file with its .py file missing can be imported just fine. It's not human-readable as-is. It can of course be decompiled into Python source, but the result is… basically the same result you get from running an obfuscater on the original source. So, it's slightly better than what you're trying to do, and a whole lot easier.

You can't compile your top-level script this way, but that's easy to work around. Just write a one-liner wrapper script that does nothing but import the real top-level script. If you have if __name__ == '__main__': code in there, you'll also need to move that to a function, and the wrapper becomes a two-liner that imports the module and calls the function… but that's as hard as it gets.) Alternatively, you could run pyobfuscator on just the top-level script, but really, there's no reason to do that.

In fact, many of the packager tools can optionally do all of this work for you automatically, except for writing the trivial top-level wrapper. For example, a default py2app build will stick compiled versions of your own modules, along with stdlib and site-packages modules you depend on, into a pythonXY.zip file in the app bundle, and set up the embedded interpreter to use that zipfile as its stdlib.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • By using `decompyle2`, we can convert a `.pyc` back to the original source code, including original variable names, comments, docstrings @abarnert ! So keeping a `.pyc` is not an obfuscation method at all ! – Basj Jan 06 '14 at 21:27
  • @Basj: That's not true; comments aren't even stored in the `.pyc` file, so no decompiler could possibly restore them. I can't actually test `decompyle2` since it only works up to Python 2.2, but it's trivial to verify that none of the more modern ones do the impossible. More importantly, as I already explained, obfuscation is generally not a useful thing to do to Python code in the first place. If you're just trying to stop casual browsing of your source, shipping .pyc files is sufficient; if you're trying to slow down real reverse engineering, `pyobfuscate` is nowhere near sufficient. – abarnert Jan 06 '14 at 21:55
  • `decompyle2` worked for me with Python 2.7 @abarnert. And I recovered the original sourcecode (including docstrings, original variable names, etc.) from a `.pyc`. `pyobfuscate` IS much better : the sourcecode is very very more difficult to read, because variable names are changed, etc. – Basj Jan 07 '14 at 13:00
  • 1
    @Basj: You really recovered the comments? I don't believe you. Give me some sample code to prove it. – abarnert Jan 07 '14 at 19:46
  • I don't understand how to compile all but the top-level script. I still get import errors? How do I import a .pyo file? – tarabyte Nov 05 '15 at 20:18
  • @abarnert Did you ever get a sample from @Basj? Do you have any tips or recommendations on what are the best ways to compile a project (with many files in many folders) to `.pyc`? – Myzel394 Mar 19 '22 at 13:27
1

You can make a tool that:

  • Reads through your source files and puts all identifiers in a set.
  • Subtracts all identifiers from recursively searched standard- and third party modules from that set (modules, classes, functions, attributes, parameters).
  • Subtracts some explicitly excluded identifiers from that list as well, as they may be used in getattr/setattr/exec/eval
  • Replaces the remaining identifiers by gibberish

Or you can use this tool I wrote that does exactly that.

To obfuscate multiple files, use it as follows:

  • For safety, backup your source code and valuable data to an off-line medium.
  • Put a copy of opy_config.txt in the top directory of your project.
  • Adapt it to your needs according to the remarks in opy_config.txt.
  • This file only contains plain Python and is exec’ed, so you can do anything clever in it.
  • Open a command window, go to the top directory of your project and run opy.py from there.
  • If the top directory of your project is e.g. ../work/project1 then the obfuscation result will be in ../work/project1_opy.
  • Further adapt opy_config.txt until you’re satisfied with the result.
  • Type ‘opy ?’ or ‘python opy.py ?’ (without the quotes) on the command line to display a help text.
Jacques de Hooge
  • 6,750
  • 2
  • 28
  • 45
  • If you wrote that tool, you should disclose your affiliation. – George Stocker Jun 17 '15 at 13:27
  • HI George, That's just what I in my previous answer. I got this reaction: " While this answer does answer the question, please reformulate it so it sounds less like an advertisement (avoiding "my", "free" etc..) ". It never seems to be right, is it! – Jacques de Hooge Jun 17 '15 at 14:50
  • O and by the way, this was my reaction: "Edited accordingly. But isn't it a bit dishonest to hide my involvement in the matter? – jacdeh". – Jacques de Hooge Jun 17 '15 at 14:51
  • @jacdeh I suggest you carefully read [George's answer](http://meta.stackoverflow.com/a/297151/1906307). Pay close attention to the 3 points that George took pains to highlight. You've not covered the last two points *at all* in your answer. – Louis Jun 17 '15 at 15:36
0

I think you can try using the find command with -exec option.

you can execute all python scripts in a directory with the following command.

find . -name "*.py" -exec python {} ';'

Wish this helps.

EDIT:

OH sorry I overlooked that if you obfuscate files seperately they may not run properly, because it renames function names to different names in different files.

shengy
  • 9,461
  • 4
  • 37
  • 61