1

I'm trying to learn how to deobfuscate some code that is unneccesarily complicated. For example, I would like to be able to rewrite this line of code:

return ('d' + chr(101) + chr(97) + chr(200 - 100)) # returns 'dead'

to:

return 'dead'

So basically, I need to evaluate all literals within the py file, including complicated expressions that evaluate to simple integers. How do I go about writing this reader / is there something that exists that can do this? Thanks!

1 Answers1

1

What you want is a program transformation system (PTS).

This is a tool for parsing source code to an AST, transforming the tree, and then regenerating valid source code from the tree. See my SO answer on rewriting Python text for some background.

With a PTS like (my company's) DMS Software Reengineering Toolkiit, you can write rules to do constant folding, which means essentially doing compile-time arithmetic.

For the example you show, the following rules can accomplish OP's example:

     rule fold_subtract_naturals(n:NATURAL,m:NATURAL): sum->sum =
        " \n + \m " ->  " \subtract_naturals\(\n\,\m\) ";

     rule convert_chr_to_string(c:NATURAL): term->term =
       " chr(\c) " -> make_string_from_natural(c) ;

     rule convert_character_literal_to_string(c:CHARACTER): term->term =
       " \c " -> make_string_from_character(c) ;

     rule fold_concatenate_strings(s1:STRING, s2:STRING): sum->sum =
        " \s1 + \s2 " ->  " \concatenate_strings\(\s1\,\s2\) ";

     ruleset fold_strings = { 
          fold_subtract_naturals,
          convert_chr_to_string,
          convert_characater_to_string,
          fold_concatenate_strings };

Each of the individual rules matches corresponding syntax/trees. They are written in such a way that they only apply to literal constants.

fold_add_naturals finds pairs of NATURAL constants joined by an add operation, and replaces that by the sum using a built-in function that sums two values and produces a literal value node containing the sum.

convert_chr_to_string converts chr(c) to the corresponding string literal.

convert_character_to_string converts 'C' to the corresponding string "C".

fold_concatenate_strings combines two literal strings separated by an add operator. It works analogously to the way that fold_add_naturals works.

subtract_naturals and concatenate_strings are built into DMS. convert_chr_to_string and convert_character_to_string need to be custom-coded in DMS's metaprogramming language, PARLANSE, but these routines are pretty simple (maybe 10 lines).

The ruleset packages up the set of rules so they can all be applied. Not shown is the basic code to open a file, call the parser, invoke the ruleset transformer (which applies rules until no rule applies). The last step is to call the prettyprinter to reprint the modified AST.

Many other PTS offer similar facilities.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341