6

Actually, Dropbox made it very well, they were able to secure their desktop application made in python; I researched this a lot, but no good solution better than obfuscation, which is not very secure way to go, and you will end up seeing your code uploaded somewhere.

I listened to a session made by Giovanni Bajo (the PyInstaller founder), he said Dropbox does this:

  1. Bytecode-scrambling by recompiling your CPython's interpreter, and by this, standard CPython interpreter will not be able to run it, only the recompiled cpython interpreter.
  2. All what you need to do is to shuffle the numbers below the define loadup 8.

I've never gone through Python's source code, so, I will not claim that I fully understand the above words.

I need to hear the voice of experts: How to do such a thing? And if after recompilation I will be able to package my application using the available tools like PyInstaller?

Update:

I made some research regarding how Dropbox does this type of obfuscation/mutation, and I found this:

According to Hagen Fritsch, they do it in two stages:

  1. They use TEA cipher along with an RNG seeded by some values in the code object of each python module. They adjusted the interpreter accordingly so that it

    a) Decrypts the modules and

    b) Prevents access to the decrypted code-objects.

    This would have been the straightforward path just letting dropbox decrypt everything and dump the modules using the builtin marshaller.

  2. Another trick used is the manual scrambling of the opcodes. Unfortunately this could only be fixed semiautomatically thus their monoalphabetic substitution cipher proved quite effective in terms of winning some time.

I still want more insights on how this could be done, more over, I don't know how the decryption happens in this process... I want all the experts' voice here ... common guys where are you.

securecurve
  • 5,589
  • 5
  • 45
  • 80
  • Similar, more recent question: [Undecompilable Python](http://stackoverflow.com/q/15087339/222914) – Janne Karila Feb 26 '13 at 11:31
  • Thanks Janne. Very similar to what you said, adding to that, the removal of introspection modules in the newly created/shuffled interpreter. – securecurve Feb 26 '13 at 12:35
  • at the end of the day, the bytecode will be extractable from the memory, then, the attacker can do the comparison between the new shuffled bytecode and the standard one to know how the shuffling was; but, let's be fair, couldn't the same attacker decompile a program written in C to get the source code, nothing is secure against reversing, it's a trade-off ... that's how I see things, may be I'm wrong. – securecurve Feb 26 '13 at 12:40

1 Answers1

2

I suppose this is about shuffling the numbers in include/opcode.h. I don't see a #define loadup there, though, but maybe that refers to some old Python version. I have not tried this.

This will obfuscate your .pyc files so that they cannot be inspected by any tools that recognize normal .pyc files. This may help you hide some security measures inside your program. However, an attacker might be able (for example) to extract your custom Python interpreter from your app bundle and leverage that to inspect the files. (Just launch the interactive interpreter and start investigation by importing and using dir on a module)

Note also that your package will surely contain some modules from the Python standard library. If an attacker guesses that you have shuffled the opcodes, he could do a byte-for-byte comparison between your version and the normal version of a standard module and discover your opcodes that way. To prevent this simple attack, one can protect the modules with proper encryption and try to hide the decryption step in the interpreter, as mentioned in the updated question. This forces the attacker to use machine code debugging to look for the decryption code.


I don't know how the decryption happens in this process...

You would modify the part of the interpreter that imports modules and insert your decryption C code there.

Janne Karila
  • 24,266
  • 6
  • 53
  • 94
  • Well, is that all? We only change the numbers in `opcode.h`, recompile python interpreter, package my app, then ship it to the customer? Will that produce a Dropbox-like app with the same level of security? – securecurve Feb 21 '13 at 09:32
  • @securecurve Expanded my answer. I don't know what level of security the Dropbox app has. – Janne Karila Feb 21 '13 at 09:52
  • At this point, the extraction of those opcodes will not be straightforward, the attacker will have to use disassembles/decompilers and debugging tools to watch those opcodes in memory ... Am I talking correctly or will this happen differently? – securecurve Feb 21 '13 at 11:04
  • +1 for updating the answer 2 times for improvement, yet I won't check-mark it now for more answers ... – securecurve Feb 21 '13 at 14:59
  • @securecurve Yeah, I'm also interested if anyone has more to say on the subject. – Janne Karila Feb 22 '13 at 07:30
  • Hey dude, check my answer and let me know what do you think – securecurve Feb 23 '13 at 16:04