137

I am looking for how to hide my Python source code.

print "Hello World!" 

How can I encode this example so that it isn't human-readable? I've been told to use base64 but I'm not sure how.

evandrix
  • 6,041
  • 4
  • 27
  • 38
str1k3r
  • 1,429
  • 2
  • 11
  • 9
  • 12
    Please be more clear. Do you just want to encode the file so that you can decode it later (by the command line, say)? Or do you want to make a file that you can still run by `python myfile.py` but have myfile.py be encoded? – Andrew Jaffe Jul 27 '10 at 13:32
  • 11
    What are you trying to hide? There is no effective way to obfuscate python such that it can not be trivially converted back to human readable. If you have code that valuable, convert it to C, or keep it on a server. – Stan Graves Jul 27 '10 at 14:15
  • 6
    semiuseless: I suspect 30 lines of code isn't going to be hidden that well in C, either. – Ken Jul 27 '10 at 14:43
  • @Ken, true...but 30 lines of Python is probably going to expand by some factor when converted into C. ;) – Stan Graves Jul 27 '10 at 21:21
  • To emphasis what @StanGraves meant by "keep it on a server": Consider not distributing your code at all and offer it as a service to your customers instead, perhaps through a website or REST API. This way your code stays on a server that you control, while clients can still use it. Of course this is not doable for every piece of software. – jlh Mar 10 '18 at 12:55
  • I will recommend you to use [pyobfuscate.tk](http://pyobfuscate.tk/uploads/obfuscate.php). – Nouman Aug 09 '18 at 07:44
  • This is (as has been posted previously) almost completely useless, but if you really want to, you can use alternate encoding, like say [ROT13](https://stackoverflow.com/questions/101268/hidden-features-of-python/1024693#1024693). – Wayne Werner Jul 27 '10 at 15:58
  • 3
    The reality is that Python is not the right language to use if you want to obfuscate the code. [This posting](https://stackoverflow.com/questions/261638/how-do-i-protect-python-code) has some excellent discussions about this very point. – Isaac Aug 09 '13 at 16:46
  • https://github.com/chris-rands/emojify – Chris_Rands Oct 26 '19 at 21:05
  • Does this answer your question? [How do I protect Python code?](https://stackoverflow.com/questions/261638/how-do-i-protect-python-code) – mkrieger1 Jul 10 '20 at 10:20

22 Answers22

111

This is only a limited, first-level obfuscation solution, but it is built-in: Python has a compiler to byte-code:

python -OO -m py_compile <your program.py>

produces a .pyo file that contains byte-code, and where docstrings are removed, etc. You can rename the .pyo file with a .py extension, and python <your program.py> runs like your program but does not contain your source code.

PS: the "limited" level of obfuscation that you get is such that one can recover the code (with some of the variable names, but without comments and docstrings). See the first comment, for how to do it. However, in some cases, this level of obfuscation might be deemed sufficient.

PPS: If your program imports modules obfuscated like this, then you need to rename them with a .pyc suffix instead (I'm not sure this won't break one day), or you can work with the .pyo and run them with python -O ….pyo (the imports should work). This will allow Python to find your modules (otherwise, Python looks for .py modules).

Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
  • 46
    unfortunately https://github.com/Mysterie/uncompyle2 is able to recover full source code – sherpya Oct 26 '16 at 17:00
  • 4
    @sherpya Yeah, even (at least some) variable names are recovered (with Python 2.7)… Docstrings and comments are lost, though. This is far from a perfect obfuscation scheme, anyway. – Eric O. Lebigot Oct 28 '16 at 11:54
  • 10
    This is no obfuscation. This is just compiling to byte code – Trect Dec 29 '19 at 18:08
  • 7
    @Tessaracter: It's still obfuscation. Not necessarily very effective obfuscation, but without a decompiler, it's largely gibberish bytes to a human. When it comes to interpreted scripting languages, your options are inherently limited; if the code is to run, it must also ship with the information necessary to return it to usable source or byte code. Sure, you can replace non-API names with gibberish names, but it has to be a one-to-one mapping, so all the associations remain. – ShadowRanger Feb 06 '20 at 00:07
  • 1
    There's no guarantee for `.pyc` files to be run across different Python versions! https://stackoverflow.com/questions/2705304/how-to-import-pyc-file-from-different-version-of-python – Reza Nooralizadeh Jul 05 '22 at 19:14
  • output file goes to __ pycache __ dir – pera Oct 24 '22 at 09:37
  • I'm not sure if it would be much better, but you could compile it to native code instead of bytecode (with Cython). – Brōtsyorfuzthrāx Oct 25 '22 at 01:56
44

so that it isn't human-readable?

i mean all the file is encoded !! when you open it you can't understand anything .. ! that what i want

As maximum, you can compile your sources into bytecode and then distribute only bytecode. But even this is reversible. Bytecode can be decompiled into semi-readable sources.

Base64 is trivial to decode for anyone, so it cannot serve as actual protection and will 'hide' sources only from complete PC illiterates. Moreover, if you plan to actually run that code by any means, you would have to include decoder right into the script (or another script in your distribution, which would needed to be run by legitimate user), and that would immediately give away your encoding/encryption.

Obfuscation techniques usually involve comments/docs stripping, name mangling, trash code insertion, and so on, so even if you decompile bytecode, you get not very readable sources. But they will be Python sources nevertheless and Python is not good at becoming unreadable mess.

If you absolutely need to protect some functionality, I'd suggest going with compiled languages, like C or C++, compiling and distributing .so/.dll, and then using Python bindings to protected code.

Community
  • 1
  • 1
Daniel Kluev
  • 11,025
  • 2
  • 36
  • 36
  • 67
    the part _Python is not good at becoming unreadable mess_ does not really hold true.. there seems to be an inflow of people who manage this task well. – Evgeny Jun 02 '18 at 22:08
  • 3
    Compiled C code is equally reverse engineer-able as .pyc code, even if it takes a little more know how. Put if this way if the computer can read it, then a person can read it to with enough time. – gbtimmon Jan 31 '19 at 15:09
  • 2
    _Bytecode can be decompiled into semi-readable sources._ Uncompyle recovers even comments. Source code hiding by distributing bytecode is pointless with python. – Viktor Joras Jul 09 '19 at 11:44
41

You could embed your code in C/C++ and compile Embedding Python in Another Application

embedded.c

#include <Python.h>

int
main(int argc, char *argv[])
{
  Py_SetProgramName(argv[0]);  /* optional but recommended */
  Py_Initialize();
  PyRun_SimpleString("print('Hello world !')");
  Py_Finalize();
  return 0;
}

In Ubuntu/Debian

$ sudo apt-get install python-dev

In Centos/Redhat/Fedora

$ sudo yum install python-devel

compile with

$ gcc -o embedded -fPIC -I/usr/include/python2.7 -lpython2.7 embedded.c

run with

$ chmod u+x ./embedded
$ time ./embedded
Hello world !

real  0m0.014s
user  0m0.008s
sys 0m0.004s

initial script: hello_world.py:

print('Hello World !')

run the script

$ time python hello_world.py
Hello World !

real  0m0.014s
user  0m0.008s
sys 0m0.004s

however some strings of the python code may be found in the compiled file

$ grep "Hello" ./embedded
Binary file ./embedded matches

$ grep "Hello World" ./embedded
$

In case you want an extra bit of obfuscation you could use base64

...
PyRun_SimpleString("import base64\n"
                  "base64_code = 'your python code in base64'\n"
                  "code = base64.b64decode(base64_code)\n"
                  "exec(code)");
...

e.g:

create the base 64 string of your code

$ base64 hello_world.py
cHJpbnQoJ0hlbGxvIFdvcmxkICEnKQoK

embedded_base64.c

#include <Python.h>

int
main(int argc, char *argv[])
{
  Py_SetProgramName(argv[0]);  /* optional but recommended */
  Py_Initialize();
  PyRun_SimpleString("import base64\n"
                    "base64_code = 'cHJpbnQoJ0hlbGxvIFdvcmxkICEnKQoK'\n"
                    "code = base64.b64decode(base64_code)\n"
                    "exec(code)\n");
  Py_Finalize();
  return 0;
}

all commands

$ gcc -o embedded_base64 -fPIC -I/usr/include/python2.7 -lpython2.7 ./embedded_base64.c
$ chmod u+x ./embedded_base64

$ time ./embedded_base64
Hello World !

real  0m0.014s
user  0m0.008s
sys 0m0.004s

$ grep "Hello" ./embedded_base64
$

update:

this project (pyarmor) might also help:

https://pypi.org/project/pyarmor/

user9869932
  • 6,571
  • 3
  • 55
  • 49
  • 5
    Compiling a raw string into C will still be visible if disassembled. Is it possible to encrypt your Python in a file and have C decrypt it securely? –  Mar 02 '19 at 17:03
  • 2
    Not sure about that... wouldn't that way still be possible to decipher the code? wouldn't the key still be implicit? - don't quote me on that – user9869932 Mar 02 '19 at 19:43
39

You can use the base64 module to encode strings to stop shoulder surfing, but it's not going to stop someone finding your code if they have access to your files.

You can then use the compile() function and the eval() function to execute your code once you've decoded it.

>>> import base64
>>> mycode = "print 'Hello World!'"
>>> secret = base64.b64encode(mycode)
>>> secret
'cHJpbnQgJ2hlbGxvIFdvcmxkICEn'
>>> mydecode = base64.b64decode(secret)
>>> eval(compile(mydecode,'<string>','exec'))
Hello World!

So if you have 30 lines of code you'll probably want to encrypt it doing something like this:

>>> f = open('myscript.py')
>>> encoded = base64.b64encode(f.read())

You'd then need to write a second script that does the compile() and eval() which would probably include the encoded script as a string literal encased in triple quotes. So it would look something like this:

import base64
myscript = """IyBUaGlzIGlzIGEgc2FtcGxlIFB5d
              GhvbiBzY3JpcHQKcHJpbnQgIkhlbG
              xvIiwKcHJpbnQgIldvcmxkISIK"""
eval(compile(base64.b64decode(myscript),'<string>','exec'))
Eric O. Lebigot
  • 91,433
  • 48
  • 218
  • 260
David Webb
  • 190,537
  • 57
  • 313
  • 299
  • 2
    i want my python file [all the code lines is encode ] i mean all the file is encoded !! when you open it you can't understand anything .. ! that what i want – str1k3r Jul 27 '10 at 13:56
  • 127
    Perhaps you should use Perl then - where code looks the same before and after RSA encryption. SCNR. – Tim Pietzcker Jul 27 '10 at 14:09
  • 5
    Keep in mind that anyone who has the script can decode the text back into human readable python using the same base64.b64decode function. – Stan Graves Jul 27 '10 at 14:14
  • Thanks a lot. I got this error `TypeError: a bytes-like object is required, not 'str'` at line 4 `secret = base64.b64encode(mycode)` – YasserKhalil Apr 15 '22 at 20:35
  • 1
    This solved the problem `secret = base64.b64encode(mycode.encode('utf-8'))` – YasserKhalil Apr 15 '22 at 20:45
36

Cython

It seems that the goto answer for this is Cython. I'm really surprised no one else mentioned this yet? Here's the home page: https://cython.org

In a nutshell, this transforms your python into C and compiles it, thus making it as well protected as any "normal" compiled distributable C program.

There are limitations though. I haven't explored them in depth myself, because as I started to read about them, I dropped the idea for my own purposes. But it might still work for yours. Essentially, you can't use Python to the fullest, with the dynamic awesomeness it offers. One major issue that jumped out at me, was that keyword parameters are not usable :( You must write function calls using positional parameters only. I didn't confirm this, but I doubt you can use conditional imports, or evals. I'm not sure how polymorphism is handled...

Anyway, if you aren't trying to obfuscate a huge code base after the fact, or ideally if you have the use of Cython in mind to begin with, this is a very notable option.

evandrix
  • 6,041
  • 4
  • 27
  • 38
BuvinJ
  • 10,221
  • 5
  • 83
  • 96
  • 11
    Please leave a comment if you're going to down vote something like this! This is a valid answer. If you aren't fan of Cython perhaps you can say why? Note that I pointed out that has drawbacks. It is also one of the more secure routes to go, however, and is even recommend by some standard libraries e.g. the PyInstaller docs for dealing with this concern! – BuvinJ Sep 07 '19 at 00:56
  • 3
    More information about what is present in the compiled binary: [Are executables produced with Cython really free of the source code?](https://stackoverflow.com/questions/62388701/are-executables-produced-with-cython-really-free-of-the-source-code). – Basj Jun 15 '20 at 13:40
  • 1
    Good stuff, @Basj! Thanks – BuvinJ Jun 15 '20 at 13:56
  • 3
    Most of the limitations you mention aren't really correct - there's extra syntax that adds statically typing (mainly for speed) but most normal Python programs should run unaltered, including keyword parameters, conditional imports, `eval`, polymorphism. There's definitely bits of incompatibility (mostly to do with introspection) but all of the stuff you mentioned should work (and should have worked in 2018 too) – DavidW Dec 26 '20 at 22:12
  • 1
    Oh yeah? Cool! I will have to dig deeper into this. Can you recommend any notable sources for documentation, examples, etc? – BuvinJ Dec 27 '20 at 22:12
  • 1
    I'd like to step through some of these one-by-one. First the keyword arguments. The official documentation that I see on this is https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html#optional-arguments. It appears that to use kw args a function definition must have "A single "*" without argument name can be used to terminate the list of positional arguments". If so, that means it's possible to use them, but only after making code revisions, right? – BuvinJ Dec 28 '20 at 15:21
  • 1
    (Belatedly, since I didn't see your initial reply) That bit of documentation only refers to `cdef/cpdef` functions - i.e. functions designed be be called quickly from C. Those are more limited, but are also a Cython-specific addition. A normal function defined with `def` should work substantially the same in Python and Cython. – DavidW Feb 16 '22 at 11:31
  • Aha! Well that clears up a lot! Thanks for getting back to me @DavidW, even all this time later! – BuvinJ Feb 16 '22 at 16:06
19

Check out these tools for obfuscation and minification of python code:

  • pyarmor, https://pypi.org/project/pyarmor/ - full obfuscation with hex-encoding; apparently doesn't allow partial obfuscation of variable/function names only
  • python-minifier, https://pypi.org/project/python-minifier/ - minifies the code and obfuscates function/variable names (although not as intensely as pyminifier below)
  • pyminifier, https://pypi.org/project/pyminifier/ - does a good job in obfuscating names of functions, variables, literals; can also perform hex-encoding (compression) similar as pyarmor. Problem: after obfuscation the code may contain syntax errors and not run.

Example .py output from pyminifier when run with --obfuscate and --gzip:

$ pyminifier --obfuscate --gzip /tmp/tumult.py

#!/usr/bin/env python3
import zlib, base64
exec(zlib.decompress(base64.b64decode('eJx1kcFOwzAMhu95ClMO66apu0/KAQEbE5eJC+IUpa27haVJ5Ljb+vakLYJx4JAoiT/7/+3c3626SKvSuBW6M4Sej96Jq9y1wRM/E3kSexnIOBZObrSNKI7Sl59YsWDq1wLMiEKNrenoYCqB1woDwzXF9nn2rskZd1jDh+9mhOD8DVvAQ8WdtrZfwg74aNwp7ZpnMXHUaltk878ybR/ZNKbSjP8JPWk6wdn72ntodQ8lQucIrdGlxaHgq3QgKqtjhCY/zlN6jQ0oZZxhpfKItlkuNB3icrE4XYbDwEBICRP6NjG1rri3YyzK356CtsGwZuNd/o0kYitvrBd18qgmj3kcwoTckYPtJPAyCVzSKPCMNErs85+rMINdp1tUSspMqVYbp1Q2DWKTJpcGURRDr9DIJs8wJFlKq+qzZRaQ4lAnVRuJgjFynj36Ol7SX/iQXr8ANfezCw==')))
# Created by pyminifier.py (https://github.com/liftoff/pyminifier)

This output corresponds to a 40-line original input script as shown here.

Marcin Wojnarski
  • 2,362
  • 24
  • 17
  • pyarmor has a limitation of 32kB on python file size, otherwise you pay the license. And pyminifier seems to not being maintained anymore – Salem Nov 17 '21 at 09:01
18

Nuitka

I would really recommend Nuitka over Cython. Nuitka also compiles Python to native platform code providing a similar level of obfuscation like compiled C code.

python -m pip install nuitka
python -m nuitka --follow-imports --include-package urllib3.util.ssl_ myprogram.py
./myprogram.bin
  • --follow-imports does a great job of including all imported modules.
  • --include-package if some imports are hidden and are missing when starting the compiled program, it can be helpful to pass additional packages.

Add the flags --onefile or -standalone if this works to get a package for distribution.

I also used pyarmor referenced here, but the pytransform.so or pytransform.dll shared object which is the core of pyarmor is closed source, which was a blocker in my project.

k_o_
  • 5,143
  • 1
  • 34
  • 43
  • 1
    Why do you recommend it over Cython? – ahmed abdelmalek Oct 27 '21 at 14:45
  • 1
    Cython is a low level solution. You could compile a single file or manually assemble them somehow and get a shared object file. Nuitka is high level, searching dependencies and including them in a single distribution is done by Nuitka in a very easy way. – k_o_ Oct 27 '21 at 15:03
  • 1
    Sorry to bring up an old thread. Are these (non open source) files still there at the latest pyarmor? Trying to figure out if pyarmor is safe to use – Nitay Feb 22 '22 at 11:39
  • 1
    Answering my own question here: There are binary files installed with the package. These are closed source files stored in: https://github.com/dashingsoft/pyarmor-core – Nitay Feb 22 '22 at 11:44
17

Well if you want to make a semi-obfuscated code you make code like this:

import base64
import zlib
def run(code): exec(zlib.decompress(base64.b16decode(code)))
def enc(code): return base64.b16encode(zlib.compress(code))

and make a file like this (using the above code):

f = open('something.py','w')
f.write("code=" + enc("""
print("test program")
print(raw_input("> "))"""))
f.close()

file "something.py":

code = '789CE352008282A2CCBC120DA592D4E212203B3FBD28315749930B215394581E9F9957500A5463A7A0A4A90900ADFB0FF9'

just import "something.py" and run run(something.code) to run the code in the file.

One trick is to make the code hard to read by design: never document anything, if you must, just give the output of a function, not how it works. Make variable names very broad, movie references, or opposites example: btmnsfavclr = 16777215 where as "btmnsfavclr" means "Batman's Favorite Color" and the value is 16777215 or the decimal form of "ffffff" or white. Remember to mix different styles of naming to keep those pesky people of of your code. Also, use tips on this site: Top 11 Tips to Develop Unmaintainable Code.

evandrix
  • 6,041
  • 4
  • 27
  • 38
Cold Diamondz
  • 523
  • 3
  • 6
  • 12
  • 1
    You've provided more than one answer. The second part regarding "make the code hard to read by design" should be split off as an entire separate answer to vote / comment on. – BuvinJ Oct 15 '18 at 14:31
  • 3
    Regarding that second part, how do you propose, that you as the original developer maintain your code, if you made it terrible to read and deal with purposefully? This plan only makes sense for tiny little scripts that you never have to go back and revise yourself. In which case, why do you even care about obfuscating? Just for added security I guess? If the point was "copy protection", I don't think its worth the trouble for some "afternoon project". – BuvinJ Oct 15 '18 at 14:37
17

Maybe you can try on pyconcrete, it's my open-source project

encrypt .pyc to .pye and decrypt when import it

encrypt & decrypt by library OpenAES

Usage

Full encrypted

  • convert all of your .py to *.pye

      $ pyconcrete-admin.py compile --source={your py script}  --pye
      $ pyconcrete-admin.py compile --source={your py module dir} --pye
    
  • remove *.py *.pyc or copy *.pye to other folder

  • main.py encrypted as main.pye, it can't be executed by normal python. You must use pyconcrete to process the main.pye script. pyconcrete(exe) will be installed in your system path (ex: /usr/local/bin)

      pyconcrete main.pye
      src/*.pye  # your libs
    

Partial encrypted (pyconcrete as lib)

  • download pyconcrete source and install by setup.py

      $ python setup.py install \
        --install-lib={your project path} \
        --install-scripts={where you want to execute pyconcrete-admin.py and pyconcrete(exe)}
    
  • import pyconcrete in your main script

  • recommendation project layout

      main.py       # import pyconcrete and your lib
      pyconcrete/*  # put pyconcrete lib in project root, keep it as original files
      src/*.pye     # your libs
    
Falldog Hsieh
  • 178
  • 1
  • 6
11

There are multiple ways to obfuscate code. Here's just one example:

(lambda _, __, ___, ____, _____, ______, _______, ________:
    getattr(
        __import__(True.__class__.__name__[_] + [].__class__.__name__[__]),
        ().__class__.__eq__.__class__.__name__[:__] +
        ().__iter__().__class__.__name__[_____:________]
    )(
        _, (lambda _, __, ___: _(_, __, ___))(
            lambda _, __, ___:
                chr(___ % __) + _(_, __, ___ // __) if ___ else
                (lambda: _).func_code.co_lnotab,
            _ << ________,
            (((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __)
            - _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ <<
            __) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______
            << ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) <<
            ((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) <<
            __) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______
            << (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) +
            _) << ((((___ << __) + _) << _))) + (((_______ << __) - _) <<
            (((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ <<
            _))) + (_____ << ______) + (_ << ___)
        )
    )
)(
    *(lambda _, __, ___: _(_, __, ___))(
        (lambda _, __, ___:
            [__(___[(lambda: _).func_code.co_nlocals])] +
            _(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []
        ),
        lambda _: _.func_code.co_argcount,
        (
            lambda _: _,
            lambda _, __: _,
            lambda _, __, ___: _,
            lambda _, __, ___, ____: _,
            lambda _, __, ___, ____, _____: _,
            lambda _, __, ___, ____, _____, ______: _,
            lambda _, __, ___, ____, _____, ______, _______: _,
            lambda _, __, ___, ____, _____, ______, _______, ________: _
        )
    )
)
UnsignedByte
  • 849
  • 10
  • 29
9

I know it is an old question. Just want to add my funny obfuscated "Hello world!" in Python 3 and some tips ;)

#//'written in c++'

#include <iostream.h>
#define true false
import os
n = int(input())
_STACK_CALS=  [ ];
_i_CountCals__= (0x00)
while os.urandom(0x00 >> 0x01) or (1 & True):
  _i_CountCals__+= 0o0;break;# call shell command echo "hello world" > text.txt
""#print'hello'
__cal__= getattr( __builtins__  ,'c_DATATYPE_hFILE_radnom'[ 0x00 ]+'.h'[-1]+'getRndint'[3].lower() )
_o0wiXSysRdrct   =eval (  __cal__(0x63) + __cal__(104) + 'r_RUN_CALLER'[0] );
_i1CLS_NATIVE=  getattr (__builtins__ ,__cal__(101)+__cal__(118  )+_o0wiXSysRdrct ( 0b1100001 )+'LINE 2'[0].lower( ))#line 2 kernel call
__executeMAIN_0x07453320abef  =_i1CLS_NATIVE ( 'map');
def _Main():
    raise 0x06;return 0 # exit program with exit code 0
def _0o7af():_i1CLS_NATIVE('_int'.replace('_', 'programMain'[:2]))(''.join(  __executeMAIN_0x07453320abef( _o0wiXSysRdrct ,_STACK_CALS)));return;_Main()
for _INCREAMENT in [0]*1024:
    _STACK_CALS= [0x000 >> 0x001 ,True&False&True&False ,'c++', 'h', 'e', 'l', 'o',' ', 'w', 'o', 'r', 'l', 'd']
   
#if
for _INCREAMENT in [0]*1024:
    _STACK_CALS= [40, 111, 41, 46, 46] * n
    
""""""#print'word'
while True:
    break;
_0o7af();
while os.urandom(0x00 >> 0xfa) or (1 & True): # print "Hello, world!"
  _i_CountCals__-= 0o0;break;
  while os.urandom(0x00 >> 0x01) or (1 & True):
      _i_CountCals__ += 0o0;
      break;

It is possible to do manually, my tips are:

  • use eval and/or exec with encrypted strings

  • use [ord(i) for i in s] / ''.join(map(chr, [list of chars goes here])) as simple encryption/decryption

  • use obscure variable names

  • make it unreadable

  • Don't write just 1 or True, write 1&True&0x00000001 ;)

  • use different number systems

  • add confusing comments like "line 2" on line 10 or "it returns 0" on while loop.

  • use __builtins__

  • use getattr and setattr

USERNAME GOES HERE
  • 692
  • 1
  • 15
  • 29
8

I would mask the code like this:

def MakeSC():
    c = raw_input(" Encode: ")
    sc = "\\x" + "\\x".join("{0:x}".format(ord(c)) for c in c)
    print "\n shellcode =('" + sc + "'); exec(shellcode)"; MakeSC();

Cleartext:

import os; os.system("whoami")

Encoded:

Payload = ('\x69\x6d\x70\x6f\x72\x74\x20\x6f\x73\x3b\x20\x6f\x73\x2e\x73\x79\x73\x74\x65\x6d\x28\x22\x77\x68\x6f\x61\x6d\x69\x22\x29'); exec(Payload);
Delimitry
  • 2,987
  • 4
  • 30
  • 39
GuestHello
  • 89
  • 1
  • 1
8

The best way to do this is to first generate a .c file, and then compile it with tcc to a .pyd file
Note: Windows-only

Requirements

  1. tcc
  2. pyobfuscate
  3. Cython

Install:

sudo pip install -U cython

To obfuscate your .py file:

pyobfuscate.py myfile.py >obfuscated.py

To generate a .c file,

  1. Add an init<filename>() function to your .py file Optional

  2. cython --embed file.py

  3. cp Python.h tcc\include

  4. tcc file.c -o file.pyd -shared -I\path\to\Python\include -L\path\to\Python\lib

  5. import .pyd file into app.exe

evandrix
  • 6,041
  • 4
  • 27
  • 38
Anonymous
  • 596
  • 1
  • 9
  • 26
  • 1
    What do you mean by "import .pyd file into app.exe". You didn't mention app.exe anywhere in your explanations. Could you tell me more please? – Gauthier Buttez Apr 18 '20 at 11:37
  • Simply import the pyd file into your main app (app.exe refers to the packaged version of the app) – Anonymous Apr 18 '20 at 11:43
  • Thank you so much. My issue is to do it with a main.py which call several other mymodule1.py, module2.py. Maybe you'll be able to find the correct answer to my question here: https://stackoverflow.com/posts/comments/108403128?noredirect=1 – Gauthier Buttez Apr 18 '20 at 11:47
  • 1
    Information about what is present in the compiled binary: [Are executables produced with Cython really free of the source code?](https://stackoverflow.com/questions/62388701/are-executables-produced-with-cython-really-free-of-the-source-code). – Basj Jun 15 '20 at 13:41
  • Pyobfuscate is just as an added layer, after compiling with cython and gcc it's very unlikely that someone will be able to decompile – Anonymous Oct 02 '20 at 14:18
7

Maybe you should look into using something simple like a truecrypt volume for source code storage as that seems to be a concern of yours. You can create an encrypted file on a usb key or just encrypt the whole volume (provided the code will fit) so you can simply take the key with you at the end of the day.

To compile, you could then use something like PyInstaller or py2exe in order to create a stand-alone executable. If you really wanted to go the extra mile, look into a packer or compression utility in order to add more obfuscation. If none of these are an option, you could at least compile the script into bytecode so it isn't immediately readable. Keep in mind that these methods will merely slow someone trying to debug or decompile your program.

krs1
  • 1,125
  • 7
  • 16
6

I recently stumbled across this blogpost: Python Source Obfuscation using ASTs where the author talks about python source file obfuscation using the builtin AST module. The compiled binary was to be used for the HitB CTF and as such had strict obfuscation requirements.

Since you gain access to individual AST nodes, using this approach allows you to perform arbitrary modifications to the source file. Depending on what transformations you carry out, resulting binary might/might not behave exactly as the non-obfuscated source.

7h3rAm
  • 1,775
  • 2
  • 15
  • 17
6

Try this python obfuscator:

pyob.oxyry.com pyob.oxyry.c

__all__ = ['foo']

a = 'a'
_b = 'b'

def foo():
    print(a)

def bar():
    print(_b)

def _baz():
    print(a + _b)

foo()
bar()
_baz()

will translated to

__all__ =['foo']#line:1
OO00OO0OO0O00O0OO ='a'#line:3
_O00OO0000OO0O0O0O ='b'#line:4
def foo ():#line:6
    print (OO00OO0OO0O00O0OO )#line:7
def O0000000OOOO00OO0 ():#line:9
    print (_O00OO0000OO0O0O0O )#line:10
def _OOO00000O000O0OOO ():#line:12
    print (OO00OO0OO0O00O0OO +_O00OO0000OO0O0O0O )#line:13
foo ()#line:15
O0000000OOOO00OO0 ()#line:16
_OOO00000O000O0OOO ()#line:17
Weijar Z
  • 69
  • 1
  • 3
5

Opy

https://github.com/QQuick/Opy

Opy will obfuscate your extensive, real world, multi module Python source code for free! And YOU choose per project what to obfuscate and what not, by editing the config file:

You can recursively exclude all identifiers of certain modules from obfuscation.
You can exclude human readable configuration files containing Python code.
You can use getattr, setattr, exec and eval by excluding the identifiers they use.
You can even obfuscate module file names and string literals.
You can run your obfuscated code from any platform.

Unlike some of the other options posted, this works for both Python 2 and 3! It is also free / opensource, and it is not an online only tool (unless you pay) like some of the others out there.

I am admittedly still evaluating this myself, but all of initial tests of it worked perfectly. It appears this is exactly what I was looking for!

The official version runs as a standalone utility, with the original intended design being that you drop a script into the root of the directory you want to obfuscate, along with a config file to define the details/options you want to employ. I wasn't in love with that plan, so I added a fork from project, allowing you to import and utilize the tool from a library instead. That way, you can roll this directly into a more encompassing packaging script. (You could of course wrap multiple py scripts in bash/batch, but I think a pure python solution is ideal). I requested my fork be merged into the original work, but in case that never happens, here's the url to my revised version:

https://github.com/BuvinJT/Opy

BuvinJ
  • 10,221
  • 5
  • 83
  • 96
4

There are 2 ways to obfuscate python scripts

  • Obfuscate byte code of each code object
  • Obfuscate whole code object of python module

Obfuscate Python Scripts

  • Compile python source file to code object

    char * filename = "xxx.py";
    char * source = read_file( filename );
    PyObject *co = Py_CompileString( source, filename, Py_file_input );
    
  • Iterate code object, wrap bytecode of each code object as the following format

    0   JUMP_ABSOLUTE            n = 3 + len(bytecode)    
    3
    ...
    ... Here it's obfuscated bytecode
    ...
    
    n   LOAD_GLOBAL              ? (__armor__)
    n+3 CALL_FUNCTION            0
    n+6 POP_TOP
    n+7 JUMP_ABSOLUTE            0
    
  • Serialize code object and obfuscate it

    char *original_code = marshal.dumps( co );
    char *obfuscated_code = obfuscate_algorithm( original_code  );
    
  • Create wrapper script "xxx.py", ${obfuscated_code} stands for string constant generated in previous step.

    __pyarmor__(__name__, b'${obfuscated_code}')
    

Run or Import Obfuscated Python Scripts

When import or run this wrapper script, the first statement is to call a CFunction:

int __pyarmor__(char *name, unsigned char *obfuscated_code) 
{
  char *original_code = resotre_obfuscated_code( obfuscated_code );
  PyObject *co = marshal.loads( original_code );
  PyObject *mod = PyImport_ExecCodeModule( name, co );
}

This function accepts 2 parameters: module name and obfuscated code, then

  • Restore obfuscated code
  • Create a code object by original code
  • Import original module (this will result in a duplicated frame in Traceback)

Run or Import Obfuscated Bytecode

After module imported, when any code object in this module is called first time, from the wrapped bytecode descripted in above section, we know

  • First op JUMP_ABSOLUTE jumps to offset n

  • At offset n, the instruction is to call a PyCFunction. This function will restore those obfuscated bytecode between offset 3 and n, and place the original bytecode at offset 0

  • After function call, the last instruction jumps back to offset 0. The real bytecode is now executed

Refer to Pyarmor

evandrix
  • 6,041
  • 4
  • 27
  • 38
Jondy Zhao
  • 121
  • 1
  • 4
3

As other answers have stated, there really just isn't a way that's any good. Base64 can be decoded. Bytecode can be decompiled. Python was initially just interpreted, and most interpreted languages try to speed up machine interpretation more than make it difficult for human interpretation.

Python was made to be readable and shareable, not obfuscated. The language decisions about how code has to be formatted were to promote readability across different authors.

Obfuscating python code just doesn't really mesh with the language. Re-evaluate your reasons for obfuscating the code.

Broam
  • 4,602
  • 1
  • 23
  • 38
1

Here's my very noob approach for something I'm doing in CircuitPython. It's currently partially tested. I've posted in this state because I thought it might be useful.

There are two arguments:

  • a comma-separated list of input files
  • a comma-separated list of output files

Here's what it does:

  1. Find all the variable names by looking at method signatures and the left-hand-sides of assignments, and all the import aliases.
  2. Quarantine some things that I don't want to modify.
  3. Replace the remaining names with meaningless tokens.
  4. Un-quarantine the things I quarantined.

It turns code like this

degreeIncrement = 90
durationIncrement = 0.25
def GetEditGlyphParams(self, waveform, editIndex):
    segments = waveform.leftSegments
    waveformFunctionCount =  len(self.waveformFunctions)
    totalParameterCount = 0
    segmentIndex = 0
    while segmentIndex < len(segments):
        segment = segments[segmentIndex]
        segmentParameterCount = len(self.sineFunctions)
        if segment.type == "line":
            segmentParameterCount = len(self.lineFunctions)

...into code like this:

a6 = 90 # degreeIncrement = 90
a7 = 0.25 # durationIncrement = 0.25
def a8(a9, a10, a11): # def GetEditGlyphParams(self, waveform, editIndex):
    a12 = a10.leftSegments # segments = waveform.leftSegments
    a13 =  len(a9.a5) # waveformFunctionCount =  len(self.waveformFunctions)
    a14 = 0 # totalParameterCount = 0
    a15 = 0 # segmentIndex = 0
    while a15 < len(a12): # while segmentIndex < len(segments):
        a16 = a12[a15] # segment = segments[segmentIndex]
        a17 = len(a9.a3) # segmentParameterCount = len(self.sineFunctions)
        if a16.a332 == "line": # if segment.type == "line":
            a17 = len(a9.a4) # segmentParameterCount = len(self.lineFunctions)

The comments can be omitted if necessary.

Here's the code that does it:

import sys, re
sourceDirectory = sys.argv[1]
print("sourceDirectory", sourceDirectory)
sourceFiles = sys.argv[2].split(",")
targetFiles = sys.argv[3].split(",")

if len(sourceFiles) != len(targetFiles):
    raise Exception("Source file count must match target file count. Use comma to separate.")

print("uglify", sys.argv[1])

names = []
translations = []

class Analyser:
    def AnalyseLines(self, lines):
        for line in lines:
            self._AnalyseLine(line)
    def _AnalyseLine(self, line):
        parts = self._GetParts(line)
        if len(parts) > 1 and parts[0] == "import":
            self._AnalyseImport(parts)
        if len(parts) > 1 and parts[0] == "class":
            self._AnalyseClass(parts)
        if len(parts) > 1 and parts[1] == "=":
            self._AnalyseAssignment(parts)
        if len(parts) > 1 and parts[0] == "def":
            self._AnalyseDef(parts)
    def _GetParts(self, line):
        minusTabs = line.strip().replace("\t", " ")
        minusOpenSquare = minusTabs.replace("[", " ")
        minusCloseSquare = minusOpenSquare.replace("]", " ")
        minusDoubleSpace = minusCloseSquare.replace("  ", " ")
        parts = minusDoubleSpace.split(" ")
        while "#" in parts:
            del parts[-1]
        while len(parts) > 0 and parts[0] == "":
            del parts[0]
        nonEmptyParts = []
        for part in parts:
            if len(part) > 0:
                nonEmptyParts.append(part)
        return nonEmptyParts
    def _AddName(self, name, elementType):
        nameToAppend = name # + " " + elementType
        if nameToAppend in names:
            return
        if nameToAppend == "sin" or nameToAppend == "value":
            print("--> adding", nameToAppend, "as", elementType)
        names.append(nameToAppend)
        translation = "a" + str(len(names))
        translations.append((name, translation))
    def _AnalyseImport(self, parts):
        if len(parts) == 4 and parts[0] == "import" and parts[2] == "as":
            self._AddName(parts[3], "import")
    def _AnalyseClass(self, parts):
        p1 = parts[1].split(":")
        p2 = p1[0].split("(")
        self._AddName(p2[0], "class")
    def _AnalyseAssignment(self, parts):
        mutableName = parts[0].split(".")[0]
        self._AddName(mutableName, "assignment")
    def _AnalyseDef(self, parts):
        methodNameParts = parts[1].split("(")
        if methodNameParts[0] == "__init__":
            return
        self._AddName(methodNameParts[0], "method")
        if len(methodNameParts) > 1:
            self._AddName(methodNameParts[1].replace(",", "").replace("):", ""), "param1")
        for part in parts[2:]:
            params = part.split(",")
            for param in params:
                if param != "":
                    if param.replace(":", "").replace(")", "") == "value":
                        print("found value amongst", parts)
                    self._AddName(param.replace(":", "").replace(")", ""), "paramN")

class Translator:
    def TranslateLines(self, content):
        oldLines = content.split("\n")
        content = content.replace('"', "_QUOTE_").replace("\\", "_BACKSLASH_")
        for (oldWord, newWord) in translations:
            content = re.sub(r"\b%s\b" % oldWord, newWord, content)
        content = content.replace("_QUOTE_", '"').replace("_BACKSLASH_", "\\")
        newLines = content.split("\n")
        for i in range(len(newLines) - 1):
            if newLines[i] != "":
                newLines[i] += " # " + oldLines[i].strip()
        return "\n".join(newLines)
    def TranslateLines2(self, content):
        oldLines = content.split("\n")
        newLines = []
        for lineNumber, oldLine in enumerate(oldLines):
            # print("translating line of length", len(oldLine), ":", oldLine)
            content = oldLine.split(" # ")[0]
            if len(content.strip(" \t")) > 0:
                content = content.replace('"', "_QUOTE_").replace("\\", "_BACKSLASH_")
                for (oldWord, newWord) in translations:
                    try:
                        content = re.sub(r"\b%s\b" % oldWord, newWord, content)
                    except:
                        print("problem translating", oldWord, "into", newWord)
                        raise Exception("error in translation")
                content = content.replace("_QUOTE_", '"').replace("_BACKSLASH_", "\\")
                newLines.append(content + " # " + oldLine.strip())
        return "\n".join(newLines)

lines = []

for i, sourceFileName in enumerate(sourceFiles):
    names.append(sourceFileName)
    targetFileName = targetFiles[i]
    translations.append((sourceFileName, targetFileName))

for sourceFileName in sourceFiles:
    fullFileName = sourceDirectory + sourceFileName + ".py"
    sourceFile = open(fullFileName, 'r')
    content = sourceFile.read()
    fileLines = content.split("\n")
    lines.extend(fileLines)
    print("found", len(fileLines), "lines in", sourceFileName)

print("----------------")
print("found a total of", len(lines), "lines")
print("----------------")
analyser = Analyser()
analyser.AnalyseLines(lines)

for i, name in enumerate(names):
    if len(name) < 1:
        print("deleting name", i, "because it is zero length")
        names.remove(name)
        translation = translations[i]
        translations.remove(translation)

# print(names)

# raise Exception("Not implemented beyond here.")
translator = Translator()

for i, sourceFileName in enumerate(sourceFiles):
    print("translating", sourceFileName, "into", targetFiles[i])
    fullFileName = sourceDirectory + sourceFileName + ".py"
    targetFileName = sourceDirectory + targetFiles[i] + ".py"
    sourceFile = open(fullFileName, 'r')
    content = sourceFile.read()
    targetFile = open(targetFileName, 'w')
    fileLines = content.split("\n")
    newContent = translator.TranslateLines2(content)
    targetFile.write(newContent)
    sourceFile.close()
    targetFile.close()

# print(len(lines), "lines, starting with", lines[0])
# print(names)
# print(translations)
OutstandingBill
  • 2,614
  • 26
  • 38
  • Hi. Has this facility been completed? Thx – Henry Thornton Feb 05 '22 at 11:47
  • @HenryThornton, hi, I just updated the answer with the latest code I'm using. It now reads variable names from multiple files and obfuscates them. I have put it through some more testing since the original post. I found and fixed a few bugs. I wouldn't say it's bug-free though : ) – OutstandingBill Feb 07 '22 at 04:16
0

Try pasting your hello world python code to the following site:

http://enscryption.com/encrypt-and-obfuscate-scripts.html

It will produce a complex encrypted and obfuscated, but fully functional script for you. See if you can crack the script and reveal the actual code. Or see if the level of complexity it provides satisfies your need for peace of mind.

The encrypted script that is produced for you through this site should work on any Unix system that has python installed.

If you would like to encrypt another way, I strongly suggest you write your own encryption/obfuscation algorithm (if security is that important to you). That way, no one can figure out how it works but you. But, for this to really work, you have to spend a tremendous amount of time on it to ensure there aren't any loopholes that someone who has a lot of time on their hands can exploit. And make sure you use tools that are already natural to the Unix system... i.e. openssl or base64. That way, your encrypted script is more portable.

Delimitry
  • 2,987
  • 4
  • 30
  • 39
RoyMWell
  • 199
  • 1
  • 9
-2

I'll write my answer in a didactic manner...

First type into your Python interpreter:

import this

then, go and take a look to the file this.py in your Lib directory within your Python distribution and try to understand what it does.

After that, take a look to the eval function in the documentation:

help(eval)

Now you should have found a funny way to protect your code. But beware, because that only works for people that are less intelligent than you! (and I'm not trying to be offensive, anyone smart enough to understand what you did could reverse it).

fortran
  • 74,053
  • 25
  • 135
  • 175