0

I want to parse through a javascript and find all the variable declarations, attributions, and calls to functions from a specific library.

What would be the best approach:regular expressions, lexer, use something already done that does that (does it exist?)....?

What I want in fact is to be assured that an object namespace and methods are not modified, and this through a static analysis.

Eduard Florinescu
  • 16,747
  • 28
  • 113
  • 179

2 Answers2

1

You can not do it with regexes and probably you also do not want to write you own implementation of ecma-standard 262 (It is a total overkill).
As for me I dig google's V8 javascript engine, more precisely PyV8. I suggest you can use it.

If you had problems there is the code I used to install (pip installation had an error for my x64 system, so I used sources):

apt-get install subversion scons libboost-python-dev
svn checkout http://v8.googlecode.com/svn/trunk/ v8
svn checkout http://pyv8.googlecode.com/svn/trunk/ pyv8
cd v8
export PyV8=`pwd`
cd ../pyv8
sudo python setup.py build
sudo python setup.py install

As I remember these commands did not make errors for me. (I copypasted it but it worked)

Answer to the question itself:
More complex hello wolrd example, list some varibales of the global object:

import PyV8

class Global(PyV8.JSClass):      # define a compatible javascript class
    def hello(self):               # define a method
        print "Hello World"

    def alert(self, message): # my own alert function
        print type(message), '  ', message

    @property
    def GObject(self): return self

    def __setattr__(self, key, value):
        super(Global, self).__setattr__(key, value)
        print key, '=', value

G = Global()
ctxt = PyV8.JSContext(G)
ctxt.enter()
ctxt.eval("var a=hello; GObject.b=1.0; a();")
list_all_cmd = '''for (myKey in GObject){
alert(GObject[myKey]);
}'''
ctxt.eval(list_all_cmd)
ctxt.leave()

(In browsers you should call you global object - Window)
This code will output:

b = 1
Hello World
<class '__main__.Global'>    <__main__.Global object at 0x7f202c9159d0>
<class '_PyV8.JSFunction'>    function () { [native code] }
<type 'int'>    1
<class '_PyV8.JSFunction'>    function () { [native code] }
<class '_PyV8.JSFunction'>    function () { [native code] }
<class '_PyV8.JSFunction'>    function () { [native code] }
<class '_PyV8.JSFunction'>    function () { [native code] }
<class '_PyV8.JSFunction'>    function () { [native code] }
<class '_PyV8.JSFunction'>    function () { [native code] }
<class '_PyV8.JSFunction'>    function () { [native code] }
<class '_PyV8.JSFunction'>    function () { [native code] }
Sergey
  • 19,487
  • 13
  • 44
  • 68
0

You can use Rhino from Mozilla. It is a Javascript implementation written in Java. 1.7R3 release onwards have a new AST API. The classes are available in org.mozilla.javascript.ast

If you want to do this in Javascript, please see this discussion JavaScript parser in JavaScript

Hope it helps.

Community
  • 1
  • 1
krishnakumarp
  • 8,967
  • 3
  • 49
  • 55
  • I am already considering pynoceros(python rhino port), pynarcissus, and pyV8 so thanks for confirming my aproach :), also thanks for the sugestion with jslint parser maybe I can also find a jslint port for python and look there in the regular expressions. I see that it builds from code objects like `` any ideea how can I trace a variable through that? I don't know how to work effectively with it. – Eduard Florinescu Aug 09 '12 at 08:33