2

I am trying to puzzle out a way to de-obfuscate javascript that looks like this:

https://jsfiddle.net/douglasg14b/4951br9f/2/

var testString = 'Test | String'

var wf6 = {
 fq4: 'su',
 k8d: 'bs',
 l8z: 'tri',
 cy1: 'ng',
 t5j: 'te',
 ol: 'stS',
 x3q: 'tri',
 l9x: 'ng',
 gh: 'xO'
};


//Obfuscated
let test1 = testString[wf6.fq4 + wf6.k8d + wf6.l8z + wf6.cy1](4,11);

//Normal
let test2 = testString.substring(4,11);
let test3;

//More complex obfuscation
(function moreComplex(){
 let h = "i",
        w = "nde",
        T0 = "f",
        hj = '|',
        a = eval(wf6.t5j + wf6.ol + wf6.x3q + wf6.l9x).length;
    //Obfuscated
    test3 = testString[wf6.fq4 + wf6.k8d + wf6.l8z + wf6.cy1](testString[h + w + wf6.gh + T0](hj), a);
    
    //Normal
    let test4 = testString.substring(testString.indexOf('|'), testString.length);
        
})();

$('.span1').text(test1);
$('.span2').text(test3);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<span class="span1"></span><br>
<span class="span2"></span>

This is a small example, the file I'm working with is ~60k lines long and is full this kind of obfuscation. Everywhere a string can be used as a property name, this kind of obfuscation is used.

The way I can think of doing this, is to evaluate all the string concatenations so they are turned into a readable equivalent. Though, I am not sure how to go about this and ignore all the other working code that exists between all the concatenations.

Thoughts?

Bonus question: Is there a commonly used name for this kind of obfuscation that might make searches a bit easier?

Edit: Added a more complex example.

Douglas Gaskell
  • 9,017
  • 9
  • 71
  • 128

1 Answers1

4

You have the basic idea right: you have to partially-evaluate the program and precompute all the constant computations. In your case, the constant computations of main interest are the concatenation steps over values which don't change.

To do this, you need a program transformation system (PTS). This is a tool that will read/parse source code for a specified language and build an abstract syntax tree, allow you specify transformations and analyses over the AST, and run those, and then spit out the modified AST as source code again.

In your case, you obviously want a PTS that is wired to know JavaScript out of the box (rare) or is willing to accept a description of JavaScript and then read JavaScript (more typical) with the hope that you can build or get a JavaScript description easily. [I build a PTS that has JavaScript descriptions available, see my bio].

With that in hand, you need to:

  • code an analyzer that inspects each variable found in an expression to see if that expression is constant (e.g., "wf6"). To demonstrate it is constant, you will have to find the variable definition, and check that all the values used in the variable definition are themselves constants. If there is more than one variable definition, you might have to check that all definitions produce the same value. You need to check for side-effects on the variable (e.g, there are no function calls "foo(...,wf6,...)" which would allow the variable's value to be modified). You need to worry about whether an eval command to accomplish such a side effect exists [this is virtually impossible to do, so you often have to just ignore evals and assume they do not do such things]. Many PTSes will have a way to allow you to build such analyzers; some are easier than others.
  • For every constant valued variable, substitute the value of that variable in the code
  • For every constant-valued sub-expression after such substitutions, "fold" (calculate) the result of that expression and substitute that value for that subexpression and repeat until no more folding is possible. Obviously you want to do this for at least all "+" operators. [OP just modified his example; he'll want to do it for "eval" operators too when all its operands are constant].
  • You may have to iterate this process, as folding an expression may make it obvious that a variable now has a constant value

The above process is called "constant propagation" in the compiler literature and is a feature of many compilers.

In your case, you could restrict the constant folding to just string concatenates. However, once you have adequate machinery to do constant value propagation, doing all or most operators on constants isn't that hard. You may need this to undo other obfuscations involving constants since that seems to be the obfuscation style used on the code you are working on.

You'll need a special rule that transforms

var['string'](args)

into

 var.string(args)

as a final step.

You have another complication: that is knowing that you have all the JavaScript relevant to producing constant-valued variables. A single web page may have many included chunks of JavaScript; you will need all of them to demonstrate there are no side effects on a variable. I assume in your case you are sure you have it all.

With respect to producing known-constant values, you may have worry about a tricky case: an expression that produces constant values from non-constant operands. Imagine the obfuscated expression was:

   x=random(); // produce a value between 0 and 1
   one=x+(1-x); // not constant by constant propagation, but constant by algebraic relations
   teststring['st'[one]+'vu'[one+1]+'bz'[one]+...](4,11)

You can see it always computes 'substring' as a property. You can add a transformation rule that understands the trick used to compute "one", e.g., a rule for each algebraic trick used to compute known constants. Unfortunately for you, there's an infinite number of algebra theorems one can use to manufacture constants; how many are really used in your example bit of code? [Welcome to the problem of reverse engineering with a smart adversary].

Nope, none of this "easy". Presumably that's why the obfuscation method used was chosen.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Can you link the tool you've written? Can it do the things asked here? – Bergi Oct 08 '16 at 19:09
  • If you can do the constness analysis by hand (which is usually simple, as the obfuscators mostly use only constants), you can often do the "evaluation" by manually [doing regex replacements on the code](http://stackoverflow.com/q/30879056/1048572) – Bergi Oct 08 '16 at 19:16
  • @Bergi: The tool is called the "DMS Software Reengineering Toolkit". See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html – Ira Baxter Oct 08 '16 at 20:49
  • @Bergi: Regex applied to code generally doesn't work because you are applying it to text with nested structures () [] {} , and regexes can't keep track of nesting. See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 How would you solve OP's problem this way? Give an concrete example and remember he has 60K lines of code. – Ira Baxter Oct 08 '16 at 20:52
  • Wow! This is one heck of an answer, and is some pretty advanced and heavy stuff. This is not something I can do quickly or even understand quickly. Definitely something that's going to take me a while to fully grasp and utilize. Thank you. – Douglas Gaskell Oct 09 '16 at 02:15
  • @IraBaxter I would take the substitution object and then do `code.replace(/wf6\.(\w+)/g, (_, p) => JSON.stringify(wf6[p])).replace(/"\s+\+\s+"/g, "").replace(/(\w+)\["(\w+)"\]/g, "$1.$2")`. Of course this does not work for the general case, it's a quick'n'dirty solution you can use when you don't have a PTS at hand. Often the obfuscation tools are not sophisticated enough to do multiple passes with different substitutions, so one regex replacement as above usually leads to pretty readable code already. – Bergi Oct 09 '16 at 10:11
  • 1
    @Bergi: Yes, if you knew the pairs in advance you might do that, and it might work relatively well with the specific example provided (might be wf1, wf2, ... wf6, ... wf9721 to make it really annoying even this way). Dumb obfuscators might be OP's savior. A smart one would throw in several different obfuscation techniques which compose (that makes undoing almost impossible with regex). My personal opinion is that Op might be better off finding something more constructive to do than de-obfuscation but if he insists he'll have to try different methods. – Ira Baxter Oct 09 '16 at 15:43
  • 1
    @Bergi: the bad part about using regexes like this is the manual step. If Op is going to decode only *one* program then this might be the way to go. If he has to decode a *second* version of the same program in which have been scrambled randomly, this will be pretty daunting. If he wants to do a lot of this, the manual work will likely be insurmountable. So his immediate goals matter. – Ira Baxter Oct 09 '16 at 15:59
  • @IraBaxter completely agreed. – Bergi Oct 09 '16 at 16:33