You have the basic idea right: you have to partially-evaluate the program and precompute all the constant computations. In your case, the constant computations of main interest are the concatenation steps over values which don't change.
To do this, you need a program transformation system (PTS). This is a tool that will read/parse source code for a specified language and build an abstract syntax tree, allow you specify transformations and analyses over the AST, and run those, and then spit out the modified AST as source code again.
In your case, you obviously want a PTS that is wired to know JavaScript out of the box (rare) or is willing to accept a description of JavaScript and then read JavaScript (more typical) with the hope that you can build or get a JavaScript description easily. [I build a PTS that has JavaScript descriptions available, see my bio].
With that in hand, you need to:
- code an analyzer that inspects each variable found in an expression to see if that expression is constant (e.g., "wf6"). To demonstrate it is constant, you will have to find the variable definition, and check that all the values used in the variable definition are themselves constants. If there is more than one variable definition, you might have to check that all definitions produce the same value. You need to check for side-effects on the variable (e.g, there are no function calls "foo(...,wf6,...)" which would allow the variable's value to be modified). You need to worry about whether an eval command to accomplish such a side effect exists [this is virtually impossible to do, so you often have to just ignore evals and assume they do not do such things]. Many PTSes will have a way to allow you to build such analyzers; some are easier than others.
- For every constant valued variable, substitute the value of that variable in the code
- For every constant-valued sub-expression after such substitutions, "fold" (calculate) the result of that expression and substitute that value for that subexpression and repeat until no more folding is possible. Obviously you want to do this for at least all "+" operators. [OP just modified his example; he'll want to do it for "eval" operators too when all its operands are constant].
- You may have to iterate this process, as folding an expression may make it obvious that a variable now has a constant value
The above process is called "constant propagation" in the compiler literature and is a feature of many compilers.
In your case, you could restrict the constant folding to just string concatenates. However, once you have adequate machinery to do constant value propagation, doing all or most operators on constants isn't that hard. You may need this to undo other obfuscations involving constants since that
seems to be the obfuscation style used on the code you are working on.
You'll need a special rule that transforms
var['string'](args)
into
var.string(args)
as a final step.
You have another complication: that is knowing that you have all the JavaScript relevant to producing constant-valued variables. A single web page may have many included chunks of JavaScript; you will need all of them to demonstrate there are no side effects on a variable. I assume in your case you are sure you have it all.
With respect to producing known-constant values, you may have worry about a tricky case: an expression that produces constant values from non-constant operands. Imagine the obfuscated expression was:
x=random(); // produce a value between 0 and 1
one=x+(1-x); // not constant by constant propagation, but constant by algebraic relations
teststring['st'[one]+'vu'[one+1]+'bz'[one]+...](4,11)
You can see it always computes 'substring' as a property. You can add a transformation rule that understands the trick used to compute "one", e.g., a rule for each algebraic trick used to compute known constants. Unfortunately for you, there's an infinite number of algebra theorems one can use to manufacture constants; how many are really used in your example bit of code? [Welcome to the problem of reverse engineering with a smart adversary].
Nope, none of this "easy". Presumably that's why the obfuscation method
used was chosen.