8

I intend to implement a js code coverage directly in the v8 code. My initial target is to add a simple print for every statement in the abstract syntax tree. I saw that there is an AstVisitor class , which allows you to traverse the AST. so my question is how can i add a statement to the AST after the statement the visitor is currently visiting?

  • Basic blocks are a construct for control-flow graphs, not for ASTs. Do you intend to create a CFG from the AST? –  Apr 03 '13 at 11:04
  • i may be mixing the two, but i thought the nodes of the ast are basic blocks as well? – user2240085 Apr 03 '13 at 11:12
  • *Which* nodes? In any case, I'm not aware of any common AST nodes that match basic blocks (though it's certainly possible to have a data structure that also maintains CFG-ish information and call that an "AST"). For example, a loop is usually an AST node, but many loops consist of several BBs. The loop node may contain a list of statement nodes, but some of those statements correspond to *part* of a BB (e.g. simple assignment), while others expand into *several* BBs (e.g. any inline condition, or nested loops). Perhaps you're misusing the term "basic block"? –  Apr 03 '13 at 11:17
  • you are correct, i have mixed the two , the ast will not help me to add the commands. does the v8 uses CFG while parsing? – user2240085 Apr 03 '13 at 11:18
  • I don't know, in fact I have never peeked much at V8's internals. –  Apr 03 '13 at 11:47
  • 1
    You can inspect/add statements and expressions by implementing your own ASTVisitor. This requires to modify V8. I summarized where you can intercept it here: http://stackoverflow.com/questions/9451067/access-the-abstract-syntax-tree-of-v8-engine I also achieved to modify the AST, e.g. insert and replace arbitrary statements and expressions. However, this was a) not compatible with Crankshaft and b) lead to problems when I changed nodes that drive control-flow, which I could not solve in the limited time I had available. I have a guess how this could be solved, but don't know for sure. – Jonas Apr 03 '13 at 20:01
  • @Jonas hi, i have seen your quoted post. but i am too struggling with understanding how can i add the "print commands" without messing with the flow. furthermore, i still don't see how i can add the command after the current node i am visiting. – user2240085 Apr 08 '13 at 06:34

1 Answers1

7

Ok, I'll summarize my experiments. First, what I write applies to V8 as it was used in Chromium version r157275, so things may not work any more - but I'll nevertheless link to the places in the current version.

As said, you need your own AST visitor, say MyAstVisior, which inherits from AstVisitor and must implement a bunch of VisitXYZ methods from there. The only one needed to instrument/inspect executed code is VisitFunctionLiteral. Executed code is either a function or a set of loose statements in a source (file) which V8 wraps in a function which is then executed.

Then, just before a parsed AST is converted to code, here (compilation of the function made form the loose statements) and there (compilation during runtime, when a predefined function is executed for the first time), you pass your visitor to the function literal, which will call VisitFunctionLiteral on the visitor:

MyAstVisitor myAV(info);
info->function()->Accept(&myAV);
// next line is the V8 compile call
if (!MakeCode(info)) {

I passed the CompilationInfo pointer info to the custom visitor because one needs that to modify the AST. The constructor looks like this:

MyAstVisitor(CompilationInfo* compInfo) :
    _ci(compInfo), _nf(compInfo->isolate(), compInfo->zone()), _z(compInfo->zone()){};

_ci, _nf and _z are pointers to CompilationInfo, AstNodeFactory<AstNullVisitor> and Zone.

Now in VisitFunctionLiteral you can iterate through the function body and also insert statements if you like.

void MyAstVisitor::VisitFunctionLiteral(FunctionLiteral* funLit){
    // fetch the function body
    ZoneList<Statement*>* body = funLit->body();
    // create a statement list used to collect the instrumented statements
    ZoneList<Statement*>* _stmts = new (_z) ZoneList<Statement*>(body->length(), _z);
    // iterate over the function body and rewrite each statement
    for (int i = 0; i < body->length(); i++) {
       // the rewritten statements are put into the collector
       rewriteStatement(body->at(i), _stmts);
    }
    // replace the original function body with the instrumented one
    body->Clear();
    body->AddAll(_stmts->ToVector(), _z);
}

In the rewriteStatement method you now can inspect the statement. The _stmts pointer holds a list of statements which in the end will replace the original function body. So to add a print statement after each statement you first add the original statement and then add your own print statement:

void MyAstVisitor::rewriteStatement(Statement* stmt, ZoneList<Statement*>* collector){
    // add original statement
    collector->Add(stmt, _z);

    // create and add print statement, assuming you define print somewhere in JS:

    // 1) create handle (VariableProxy) for print function
    Vector<const char> fName("print", 5);
    Handle<String> fNameStr = Isolate::Current()->factory()->NewStringFromAscii(fName, TENURED);
    fNameStr = Isolate::Current()->factory()->SymbolFromString(fNameStr);
    // create the proxy - (it is vital to use _ci->function()->scope(), _ci->scope() crashes)
    VariableProxy* _printVP = _ci->function()->scope()->NewUnresolved(&_nf, fNameStr, Interface::NewUnknown(_z), 0);

    // 2) create message
    Vector<const char> tmp("Hello World!", 12);
    Handle<String> v8String = Isolate::Current()->factory()->NewStringFromAscii(tmp, TENURED);
    Literal* msg = _nf.NewLiteral(v8String);

    // 3) create argument list, call expression, expression statement and add the latter to the collector
    ZoneList<Expression*>* args = new (_z) ZoneList<Expression*>(1, _z);
    args->Add(msg);
    Call* printCall = _nf.NewCall(_printVP, args, 0);
    ExpressionStatement* printStmt = _nf.NewExpressionStatement(printCall);
    collector->Add(printStmt, _z);   
}

The last parameter of NewCall and NewUnresolved is a number specifying the position in the script. I assume this is used for debug/error messages to tell where an error happened. I at least never encountered problems with setting it to 0 (there is also a constant somewhere kNoPosition).

Some final words: This will not actually add a print statement after each statement, because Blocks (e.g. loop bodies) are statements that represent a list of statements and loops are statements that have a condition expression and a body block. So you would need to inspect what kind of statement currently is handled and recursively look into it. Rewriting blocks is pretty much the same as rewriting a function body.

But you will run into problems when you start to replace or modify existing statements, because the AST also carries information about branching. So if you replace a jump target for some condition you break your code. I guess this could be covered if one directly adds rewriting capabilities to the single expression and statement types instead of creating new ones to replace them.

So far, I hope it helps.

Jonas
  • 515
  • 4
  • 13
  • Gad. As an alternative, consider my approach outlined in "Branch Coverage for Arbitrary Languages Made Easy" http://www.semdesigns.com/Company/Publications/TestCoverage.pdf – Ira Baxter Apr 11 '13 at 22:20