4

I'm trying to find a diff in obfuscated code which means that names of variables/functions can be different for the same code. Here's an example:
Code A:

function A(a) {
  var b = a * 4;
  console.log(b);
}

Code B:

function B(b) {
  var c = b * 4 + 2;
  console.log(c);
}

So, the only diff I would like to find between these portions of code is just + 2 part and not diff in names of function and variables.

As mentioned by @blex in comment, variables/functions names don't matter but logic change does matter. So, here's another example:
Code A:

function A(a,b,c) {
  return a * b + c;
}

Code B:

function B(x,y,z) {
  return y * z + x
}

In this example I would like to see the difference: a * b + c => y * z + x.

Is this possible to do somehow?

WhiteAngel
  • 2,594
  • 2
  • 21
  • 35
  • 4
    Hard to be reliable. For example, what if the function took 2 params `(x, y)`, and did `var z = x * 2 + y;`, and in the second version, did `var z = y * 2 + x;` (`x` & `y` have been inverted - not taking into account a variable name change)? That change would be hard to spot if you don't care about variable names. To spot it, you would need to parse the code, and analyze it. Not just "diff" it – blex Jan 05 '21 at 11:13
  • 2
    One option would be to work with the code as an [AST](https://stackoverflow.com/questions/16127985/what-is-javascript-ast-how-to-play-with-it) and compare the resulting trees. This is far from trivial and more of a theoretical possibility, but you can play around with the idea using [AST Explorer](https://astexplorer.net/). One possible implementation of interest could be [GumTree](https://github.com/GumTreeDiff/gumtree), but I'm not familiar with it myself. – Etheryte Jan 05 '21 at 11:14
  • 1
    @blex, you are right. My question is not fully correct. I will try to rephrase it. I don't care about names of variables but I do care about the logic (order). So, if logic has changed I would like to know it. – WhiteAngel Jan 05 '21 at 11:17
  • 1
    @Etheryte Thank you very much! I will check it. – WhiteAngel Jan 05 '21 at 11:17
  • The more I think about it, the more I _think_ it's not possible to be 100% reliable. Using AST as suggested by @Etheryte or Guerric P's solution seems clever and will work in basic use-cases, but in more common ones, you won't be able to reliably map functions to each other (if they were moved in the code, added, altered...). Out of curiosity, could you explain the "why" you want to do this? Maybe there's another approach – blex Jan 05 '21 at 11:59
  • Yes, I see many possible issues as well, unfortunately. The issue is that there's a library that is being updated from time to time and we need to be compliant with changes in this lib (to output the same data for integration purposes) and every time when I need to see what was changed there I don't see just 10-20 lines that were changed but the diff is almost full file because variable / function names changed even though the logic is 99% the same. – WhiteAngel Jan 05 '21 at 12:02
  • Automated testing is usually the way to do this. i.e. you create automated tests where for a given input, you expect a certain output (and do that for many edge cases) and check that your test still passes when there has been an update. This kind of tests don't care about the internal implementation, but simply about the interfaces of the lib (input - output or side-effect) **Edit**: important to note, your tests should test that your application sill works as intended while it's using the lib. Don't test the lib all by itself, that would be counter productive if you only use 20% of the lib. – blex Jan 05 '21 at 12:22
  • @blex, it's not a problem to know when lib has changed (I can see this from changed version), the problem is to see what exactly was changed. – WhiteAngel Jan 05 '21 at 17:22
  • I think I get it, but don't you find it strange that almost nobody in the world has ever had the need you have? _(according to quick Google searches)_. At work, we deal with tons of libraries, and we do update them when needed _(when we take the time to)_, and to check that our code still works with these new versions, we run the tests we already have in place for our code. We don't try to diff the libs. They can be entirely refactored, we don't really care as long as the methods we use still have the same inputs/outputs. That being said, maybe you have a really specific use-case, I don't know – blex Jan 05 '21 at 18:09
  • I'm used to solve issues that have never been asked before (or at least I didn't find them). This is challenging and interesting) Anyway, thank you for your help and advices @blex! – WhiteAngel Jan 05 '21 at 18:50

2 Answers2

3

You could use some minification library like Terser in order to produce a mangled output which should have the same function and variable names:

const { minify } = Terser;

const A = `function A(a) {
  var b = a * 4;
  console.log(b);
}`


const B = `function B(b) {
  var c = b * 4 + 2;
  console.log(c);
}`

Promise.all([A, B].map(x => minify(x, { mangle: { toplevel: true } })))
  .then(([A, B]) => console.log(`Calculate difference between ${A.code} and ${B.code}`));
<script src="https://cdn.jsdelivr.net/npm/terser/dist/bundle.min.js"></script>

I didn't write the string diffing part because it's a really big subject and there are whole libraries for that.

Guerric P
  • 30,447
  • 6
  • 48
  • 86
  • I thought of this but won't it create incorrect names of functions if, e.g., there's a new function inserted in the middle of the code? – WhiteAngel Jan 05 '21 at 11:44
  • Yes it may not work in that case, I thought the only input differences were static in your use case – Guerric P Jan 05 '21 at 11:45
  • Unfortunately, it's a big lib that has hundreds of functions and IIFEs. But I will try. Thank you! – WhiteAngel Jan 05 '21 at 11:46
  • 1
    Then you won't be able to map. How would you determine if something is new or renamed? – Guerric P Jan 05 '21 at 11:48
  • My idea was like this: if there's a function with the same contents but different name = it's the same. Pretty the same like Git does it. – WhiteAngel Jan 05 '21 at 11:58
  • I guess you should not try to implement the Git file diffing in JavaScript, just reuse it instead. – Guerric P Jan 05 '21 at 12:45
-2

how about this

function A(b){
  var c = b * 4 + 2;
  return(c)
}
function B(b) {
  var c = b * 4;
  return(c);
}

console.log(B(10)-A(4))

this will solve your problem i guess

  • Unfortunately, it won't help me. I need to find a diff of the source code. – WhiteAngel Jan 05 '21 at 11:23
  • source code can come from anywhere. In question, there are 2 examples of source code. – WhiteAngel Jan 05 '21 at 11:27
  • function A(b){ var c = b * 4 + 2; return(c) } function B(b) { var c = b * 4; return(c); } console.log(B(10)-A(4)) – Dhanush Raja Jan 05 '21 at 11:36
  • 1
    You didn't understand my question. I need to find text difference of source code. If a new line was added, I would like to see this difference, if function was removed - I need to know it as well. If some branch was added to if statement - I need to know this. The question is not connected to some specific numbers or values. – WhiteAngel Jan 05 '21 at 11:38