4

I need to debug a compiled python script (pyc file). The original source is not available.

Is there a way to debug the bytecode directly?

The closest thing I can find is to build python with LLTRACE enabled. The downside of this technique is that I have no control over the execution, and this is not true debugging, as single stepping, setting breakpoints are not available.

Another possible technique is this, but again, this isn't suitable for the same reasons.

Note that decompiling the pyc and debugging the resultant is not possible as the bytecode is obfuscated.

Extreme Coders
  • 3,441
  • 2
  • 39
  • 55
  • This is not possible. If you don't have the source, what are you going to "step" through? – Jake Griffin Sep 09 '15 at 17:56
  • 4
    @JakeGriffin step through byte code. – Andrey Sep 09 '15 at 17:57
  • "Note that decompiling the pyc and debugging the resultant is not possible as the code is obfuscated." - If he doesn't want to step through obfuscated decompiled code, I assume byte code is even worse. – Jake Griffin Sep 09 '15 at 17:59
  • 1
    @JakeGriffin Decompiling would fail. This is because the bytecode has invalid opcodes in `co_code`. It would run under python, but a decompiler would choke on those invalid opcodes. – Extreme Coders Sep 09 '15 at 18:02
  • Ah, okay, so you weren't saying that it wouldn't work because the resulting code would be obfuscated (after decompiling), you were saying that decompiling itself would not work. – Jake Griffin Sep 09 '15 at 18:06
  • @ExtremeCoders you are pursuing questionable endeavour. You try to reverse engineer obfuscated code (there are reasons why it is obfuscated) which is not trivial. I would guess the easiest would be to tweak decompiler source code not to choke on invalid instructions. – Andrey Sep 09 '15 at 18:09
  • @Andrey Obfuscated code is routinely encountered while reversing malware. Anyways thanks for your suggestion, I think I need to develop some sort of tool, which would try to remove those junk opcodes, and then try decompiling. – Extreme Coders Sep 09 '15 at 18:12
  • @ExtremeCoders why not tweak decompiler like comment out exceptions on bad opcodes? I am not sure that cleaning is easy task, some offsets may shift. Other option to replace them with NOPs. – Andrey Sep 09 '15 at 18:14
  • @Andrey Replacing with NOPs is a good idea. I will check. – Extreme Coders Sep 09 '15 at 18:19
  • Possible duplicate of [Undecompilable Python](http://stackoverflow.com/questions/15087339/undecompilable-python) – Paul Sweatte Aug 31 '16 at 19:18

2 Answers2

2

Yes it is possible to debug Python pyc files when there is no source code.

The debugger I wrote does this. See https://rocky.github.io/pycon2018.co/#/18 and the surrounding slides.

Obfiscation is a separate concern. And the question is vague as to what the obfiscation is.

If it is just of the variety variable name foo in a code object "co_names" table is replaced with ; os.system("rm -fr") then that's easy to deal with since ; os.system("rm -fr") isn't a valid identifier name.

And that's actually easier to deal with than the decompilatation process. See https://github.com/rocky/python-xdis/issues/58 for that aspect.

But even without deobfiscation the debugger will work. The source text will just look funky. However you can always use a disassembly to work around how the source code looks.

The Python debugger trepan2 and trepan3k also provide disassembly inside the debugger.

rocky
  • 7,226
  • 3
  • 33
  • 74
1

There is a bytecode interpreter written in Python for many versions of Python bytecode. It is called xpython and has a gdb-like debugger called trepan-xpy which allows you to step bytecode instructions and see the evaluation stack as you step along.

Note however that all of this is alpha quality and not all features of the Python runtime work well. Coverage for Python Bytecode for Python 3.4 to Pyton 3.6 is pretty good. As you move forward coverage of runtime features drops off. As you move back things stay pretty good in the 2.x range. Python 3.0 has always been weird.

Although the code supports running Python cross-version, for example you can interpret Python 3.5 bytecode from running the interpreter in Python 3.9 and vice versa, you will get the best results if you run the interperter using the bytecode that you are interpreting.

The reason for this is simply that there is isn't total separation from the libraries the interpreter uses from the libraries imported in the interpreted bytecode.

And if there are shared libraries that are important things can also get whacky.

rocky
  • 7,226
  • 3
  • 33
  • 74