0

I am new to clang-query and I need help to understand how to build ASTMatcher expressions for a code profiling tool I am trying to create. Eventually I would like to combine these AST matchers with a libTooling application to rewrite the source code.

Here is where I am havinve trouble. As a good starting point, I would like to break apart conditional compound statements into their individual sub-expressions, but I do not know how to do this and the AST matcher reference overwhelming for a newbie to clang.

Here is an example of a block of 'C' code I would like to analyze:

#include <stdbool.h>

// example code
int foo(const int* pa, int *pb) {
    if (pa[0] > 1) {
        pb[0] = 1;
    } else if ((pa[0] == 1) && (pa[1] == 1)) {
        pb[0] = 2;
    } else if ((pa[0] == 0) && (pa[1] == 1)) {
        pb[0] = 3;
    } else if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
        pb[0] = 4;
    }

    pb[0] = 2;
    switch (pa[0]) {
    case 0:
    case 1:
        break;
    case 2:
        break;

    default:
        break;
    }
    return 3;
}

int bar (int arg1, bool arg2) {
    return arg1 + (arg2 ? 1 : 0);
}

int main(int argc, const char **argv) {
    const int a[3] = { 0, 1, 2};
    int b[3] = {-1,-1,-1};
    foo(a, b);
    bar (3, true);
}

Using clang-query, I would like to identify all the compound conditional statements in the above code but I am having a tough time doing so. From what I understand, I should be able to use the AST dump to build AST matcher expressions.

As a very simple starting point, I can identify all the if statement conditions as follows:

clang-query> match ifStmt()

Match #1:

C:\Users\johnc\main\rewriter\.\input04.c:5:5: note: "root" binds here
    if (pa[0] > 1) {
    ^~~~~~~~~~~~~~~~

Match #2:

C:\Users\johnc\main\rewriter\.\input04.c:7:12: note: "root" binds here
    } else if ((pa[0] == 1) && (pa[1] == 1)) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Match #3:

C:\Users\johnc\main\rewriter\.\input04.c:9:12: note: "root" binds here
    } else if ((pa[0] == 0) && (pa[1] == 1)) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Match #4:

C:\Users\johnc\main\rewriter\.\input04.c:11:12: note: "root" binds here
    } else if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 matches.
clang-query>

however I would also like to identify the individual sub expressions. I turned on the ast tree dump & reran the matcher as follows (I trimmed the output to just show the 4th if statement match):

This match encloses these sub expressions

((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2))

Whare are the steps I need to follow (presumably looking at AST nodes to identify:

((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2))
(pa[0] == 1)
(pa[1] == 1)

.

clang-query> enable output dump
clang-query> match ifStmt()

Match #1:
. . .

Match #2:
. . .

Match #3:
. . .
    
Match #4:

C:\Users\johnc\main\rewriter\.\input04.c:11:12: note: "root" binds here
    } else if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Binding for "root":
IfStmt 0x13e55687480 <C:\Users\johnc\main\rewriter\.\input04.c:11:12, line:13:5>
|-BinaryOperator 0x13e55687380 <line:11:16, col:59> 'int' '||'
| |-BinaryOperator 0x13e55687260 <col:16, col:43> 'int' '&&'
| | |-ParenExpr 0x13e55687140 <col:16, col:27> 'int'
| | | `-BinaryOperator 0x13e55687120 <col:17, col:26> 'int' '=='
| | |   |-ImplicitCastExpr 0x13e55687108 <col:17, col:21> 'int' <LValueToRValue>
| | |   | `-ArraySubscriptExpr 0x13e556870c0 <col:17, col:21> 'const int' lvalue
| | |   |   |-ImplicitCastExpr 0x13e556870a8 <col:17> 'const int *' <LValueToRValue>
| | |   |   | `-DeclRefExpr 0x13e55687060 <col:17> 'const int *' lvalue ParmVar 0x13e55545a30 'pa' 'const int *'
| | |   |   `-IntegerLiteral 0x13e55687080 <col:20> 'int' 0
| | |   `-IntegerLiteral 0x13e556870e0 <col:26> 'int' 1
| | `-ParenExpr 0x13e55687240 <col:32, col:43> 'int'
| |   `-BinaryOperator 0x13e55687220 <col:33, col:42> 'int' '=='
| |     |-ImplicitCastExpr 0x13e55687208 <col:33, col:37> 'int' <LValueToRValue>
| |     | `-ArraySubscriptExpr 0x13e556871c0 <col:33, col:37> 'const int' lvalue
| |     |   |-ImplicitCastExpr 0x13e556871a8 <col:33> 'const int *' <LValueToRValue>
| |     |   | `-DeclRefExpr 0x13e55687160 <col:33> 'const int *' lvalue ParmVar 0x13e55545a30 'pa' 'const int *'
| |     |   `-IntegerLiteral 0x13e55687180 <col:36> 'int' 1
| |     `-IntegerLiteral 0x13e556871e0 <col:42> 'int' 1
| `-ParenExpr 0x13e55687360 <col:48, col:59> 'int'
|   `-BinaryOperator 0x13e55687340 <col:49, col:58> 'int' '=='
|     |-ImplicitCastExpr 0x13e55687328 <col:49, col:53> 'int' <LValueToRValue>
|     | `-ArraySubscriptExpr 0x13e556872e0 <col:49, col:53> 'int' lvalue
|     |   |-ImplicitCastExpr 0x13e556872c8 <col:49> 'int *' <LValueToRValue>
|     |   | `-DeclRefExpr 0x13e55687280 <col:49> 'int *' lvalue ParmVar 0x13e55545ae0 'pb' 'int *'
|     |   `-IntegerLiteral 0x13e556872a0 <col:52> 'int' 0
|     `-IntegerLiteral 0x13e55687300 <col:58> 'int' 2
`-CompoundStmt 0x13e55687468 <col:62, line:13:5>
  `-BinaryOperator 0x13e55687448 <line:12:9, col:17> 'int' '='
    |-ArraySubscriptExpr 0x13e55687400 <col:9, col:13> 'int' lvalue
    | |-ImplicitCastExpr 0x13e556873e8 <col:9> 'int *' <LValueToRValue>
    | | `-DeclRefExpr 0x13e556873a0 <col:9> 'int *' lvalue ParmVar 0x13e55545ae0 'pb' 'int *'
    | `-IntegerLiteral 0x13e556873c0 <col:12> 'int' 0
    `-IntegerLiteral 0x13e55687420 <col:17> 'int' 4

4 matches.
clang-query>
johnco3
  • 2,401
  • 4
  • 35
  • 67

1 Answers1

1

The question title mentions the AST tree dump. The dump is mainly helpful for finding the relevant "node matcher". For example, in the AST dump, we see both IfStmt and BinaryOperator, so we know we may want to use the ifStmt and binaryOperator matchers.

However, the AST dump does not tell us how to combine these, nor how to use the "traversal" and "narrowing" matchers. That comes from reading the examples in the AST Matcher Reference (that you also linked), plus trial and error (and asking questions on SO). It's not very easy or intuitive, unfortunately.

The specific task you've described is, as I understand it, to report binary operator expressions that are subexpressions of the condition of an if statement. Here is a clang-query command to do that:

clang-query \
  -c='m
    binaryOperator(            # We are looking for a binary operator expression
      hasType(booleanType()),  #   that has (C++) boolean type,

      anyOf(                   #   and that either
        hasParent(             #     has as its immediate parent
          ifStmt()             #       an 'if' statement,
        ),

        hasAncestor(           #     or has any ancestor
          expr(                #       that is an expression
            hasParent(         #         with immediate parent that is
              ifStmt()         #           an 'if' statement.
            )
          )
        )
      )
    )
  ' \
  test.cc --

(The comments in the above command are recognized and discarded by clang-query, not the shell.)

When given the following input as test.cc:

// test.cc
// Report subexpressions of 'if' conditions.

void g(int *pa, int *pb)
{
  if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
    pb[0] = 4;
  }
}

// EOF

it reports:

Match #1:

<path>/test.cc:6:7: note: "root" binds here
  if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Match #2:

<path>/test.cc:6:7: note: "root" binds here
  if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

Match #3:

<path>/test.cc:6:8: note: "root" binds here
  if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
       ^~~~~~~~~~

Match #4:

<path>/test.cc:6:24: note: "root" binds here
  if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
                       ^~~~~~~~~~

Match #5:

<path>/test.cc:6:40: note: "root" binds here
  if ((pa[0] == 1) && (pa[1] == 1) || (pb[0] == 2)) {
                                       ^~~~~~~~~~
5 matches.

This isn't perfect, though, because it will also report subexpressions of the "then" or "else" clause if no braces are used to enclose them. For example, if given:

  if (pa)
    x = (pa[0] == 1)? 1 : 2;

it will report:

    x = (pa[0] == 1)? 1 : 2;
         ^~~~~~~~~~

I'm not sure if that can be fixed with the AST matcher language alone, nor whether it is a problem for your intended use.

Additional notes regarding this query:

  • Using hasType(booleanType()) means this matcher only works with C++, not C, since there is no built-in boolean type in C. If working with C, the closest equivalent would be hasType(asString("int")).

  • The use of binaryOperator means this only matches binary operator expressions. If we want to match unary ! (NOT) expressions, use unaryOperator or just expr. (The query above specifically used binaryOperator since BinaryOperator appears in the AST dump and the question seemed focused on mapping the dump to a matcher.)

  • The binary ^ (XOR) operator yields int (or another promoted arithmetic type, depending on the operand types) since it is a bitwise rather than logical operator. Consequently the booleanType filter will cause it to not be reported directly. However, in an expression like !(c ^ d), there is an implicit conversion to bool associated with the parenthesis expression in the clang AST, so the parenthesis expression will be reported so long as IgnoreUnlessSpelledInSource is not also set. (The reason the query above uses a type filter is because the question appears interested in dissecting the logical components of the if condition, rather than every constituent subexpression, such as all of the identifier expressions.)

Scott McPeak
  • 8,803
  • 2
  • 40
  • 79
  • this is very useful information, thanks for the really good example, one other thing I would like to pick your brains on is with regard to unintuitive nature of traversal and narrowing matchers. I believe that there is some form of autocompletion (using double TAB) available on some platforms - I'm using clang-query on windows. This does not have autocomple support, which would be a great help if it offered context sensitive parameter options while creating matchers interactively. Have you any experience with clang-query auto completion and did you find it useful if so. – johnco3 May 26 '23 at 07:35
  • @johnco3 I don't have any experience with `clang-query` autocompletion, but a new SO question about how to get that working on Windows would seem to be on topic. – Scott McPeak May 26 '23 at 16:33
  • thank you, I got it to install via the WSL in Windows 11 - works like a charm. After making a minimal ubuntu install and then installing the clang-tools package, I was able to access autocomplete functionality. BTW I really liked your commented script example above, do you know of a decent site showing lots of similarly documented clang-query scripting examples, the comments really helped me and examples seem to be the best way to learn the AST Matcher syntax (noto sure if the syntax changes depending on AsIf or IgnoreUnlessSpelledInSource mode of operation. – johnco3 May 27 '23 at 16:26
  • @johnco3 Glad to hear it's working! I've posted a few other similarly commented examples as answers to questions with clang-related tags on SO (which you can find by going to my profile), but I haven't seen that anywhere else. The "Asif" vs. "IgnoreUnless" aspect does not change the syntax, although it does of course change the results. – Scott McPeak May 27 '23 at 17:34
  • Do you have any idea why it doesn't work if I change the file extension to '.c' as opposed to '.cpp' or '.cc'? I am not sure if there is something c++ specific with the matcher expression, I get 0 matches if the file extn is 'c'. – johnco3 May 29 '23 at 22:31
  • Also, if I make intermediate binds of the 2 variants of if statements to 'if1' and 'if2', how does one handle such a multi binding case (binding both a BinaryOperation and 2 IfStmts) to a single MatchCallback::run(MatchResult){} fn. I bound your expression to "binaryOp" and the parental expressions to 'if1' & 'if2'. I need to insert code before each of the individual binaryOperations and and slightly different code in the surrounding if statements. – johnco3 May 29 '23 at 22:32
  • 1
    @johnco3 Regarding file extension, there is no built-in boolean type in C, so the `booleanType` matcher does not match. I just added a paragraph about that to the answer. Regarding bindings in the different cases, I don't know how to do that off the top of my head. You might ask a new question. – Scott McPeak May 30 '23 at 09:42
  • Nice! Thanks for looking into that, I have no idea how I would change the matcher to also handle XOR (binary) & ! (NOT) unary cases. Could you tweak the answer to hadle that case also. I am haging a tough time understanding (due to lack of good examples (except your well documented clang-query scriptlet) is how to create matchres using the AST dump in the vicinity of the code I need to change. I thought it might be easier than it is. Here is an expression where I would like (comment for now) at various (unary & binary operation points). "if (a && b || !(c ^ d)) {…}" – johnco3 May 30 '23 at 16:27
  • 1
    @johnco3 I added notes regarding matching NOT and XOR. – Scott McPeak May 30 '23 at 21:10
  • many thanks! you have been a great help to getting me started along the right direction. – johnco3 May 31 '23 at 03:48