7

I need to split a JavaScript file into single instructions. For example

a = 2;
foo()
function bar() {
    b = 5;
    print("spam");
}

has to be separated into three instructions. (assignment, function call and function definition).

Basically I need to instrument the code, injecting code between these instructions to perform checks. Splitting by ";" wouldn't obviously work because you can also end instructions with newlines and maybe I don't want to instrument code inside function and class definitions (I don't know yet). I took a course about grammars with flex/Bison but in this case the semantic action for this rule would be "print all the descendants in the parse tree and put my code at the end" which can't be done with basic Bison I think. How do I do this? I also need to split the code because I need to interface with Python with python-spidermonkey. Or... is there a library out there already which saves me from reinventing the wheel? It doesn't have to be in Python.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
BruceBerry
  • 1,166
  • 1
  • 9
  • 21

5 Answers5

4

Why not use a JavaScript parser? There are lots, including a Python API for ANTLR and a Python wrapper around SpiderMonkey.

Hank Gay
  • 70,339
  • 36
  • 160
  • 222
  • I looked into ANTLR but seemed really complicated :-( I'm already planning to use python-spidermonkey, but i need to split the code correctly first: execute("function foo () {") gives an error. I just thought there would be another way... if i feed python objects into the js context, i could place the callbacks into python code there. but it seems rather complicated, i'm pretty new to this language-interfacing (and i'm new to js too) – BruceBerry May 09 '09 at 16:35
  • Tools like ANTLR are "really complicated" because they are dealing with really complicated problems. Lots of people try to some kind of string hack to manipulate code; it almost always ends badly, because string hacking can't reliably handle the complications. – Ira Baxter Feb 28 '12 at 18:03
2

JavaScript is tricky to parse; you need a full JavaScript parser. The DMS Software Reengineering Toolkit can parse full JavaScript and build a corresponding AST. AST operators can then be used to walk over the tree to "split it". Even easier, however, is to apply source-to-source transformations that look for one surface syntax (JavaScript) pattern, and replace it by another. You can use such transformations to insert the instrumentation into the code, rather than splitting the code to make holds in which to do the insertions. After the transformations are complete, DMS can regenerate valid JavaScript code (complete with the orignal comments if unaffected).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
0

Forget my parser. https://bitbucket.org/mvantellingen/pyjsparser is great and complete parser. I've fixed a couple of it's bugs here: https://bitbucket.org/nullie/pyjsparser

Ilia Novoselov
  • 343
  • 2
  • 4
0

Why not use an existing JavaScript interpreter like Rhino (Java) or python-spidermonkey (not sure whether this one is still alive)? It will parse the JS and then you can examine the resulting parse tree. I'm not sure how easy it will be to recreate the original code but that mostly depends on how readable the instrumented code must be. If no one ever looks at it, just generate a really compact form.

pyjamas might also be of interest; this is a Python to JavaScript transpiler.

[EDIT] While this doesn't solve your problem at first glance, you might use it for a different approach: Instead of instrumenting JavaScript, write your code in Python instead (which can be easily instrumented; all the tools are already there) and then convert the result to JavaScript.

Lastly, if you want to solve your problem in Python but can't find a parser: Use a Java engine to add comments to the code which you can then search for in Python to instrument the code.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • you are the second one to say that i could parse the code with python-spidermonkey... did i get it wrong? It doesn't seem to have any parsing functionalities. The code has to be parsed somewhere obviously, but it is done deep inside the spidermonkey engine, the python interface doesn't provide hooks into it. I only see "execute", "add_global", "rem_global" and "gc" exposed for python programmers. Am i missing something? – BruceBerry May 09 '09 at 16:39
  • unfortunately it is part of a project to analyze redirection in pages. I don't get to write the javascript code :-) And malicious websites go great lengths to obfuscate their code. – BruceBerry May 09 '09 at 18:05
0

Why not try a javascript beautifier?

For example http://jsbeautifier.org/

Or see Command line JavaScript code beautifier that works on Windows and Linux

Community
  • 1
  • 1
Dipstick
  • 9,854
  • 2
  • 30
  • 30