10

I am trying to find (line and column position) all the references of a specific function declaration when parsing a C++ source file via libclang in Python.

For example:

#include <iostream>
using namespace std;

int addition (int a, int b)
{
  int r;
  r=a+b;
  return r;
}

int main ()
{
  int z, q;
  z = addition (5,3);
  q = addition (5,5);
  cout << "The first result is " << z;
  cout << "The second result is " << q;
}

So, for the source file above, I would like for the function declaration for addition in line 5, I would like the find_all_function_decl_references(see below) to return the references of addition at lines 15 and 16.

I have tried this (adapted from here)

import clang.cindex
import ccsyspath

index = clang.cindex.Index.create()
translation_unit = index.parse(filename, args=args)

for node in translation_unit.cursor.walk_preorder():
    node_definition = node.get_definition()

    if node.location.file is None:
        continue
    if node.location.file.name != sourcefile:
        continue
    if node_def is None:
        pass
    if node.kind.name == 'FUNCTION_DECL':
        if node.kind.is_reference():
          find_all_function_decl_references(node_definition.displayname)  # TODO

Another approach could be to store all the function declarations found on a list and run the find_all_function_decl_references method on each.

Does anyone has any idea of how to approach this? How this find_all_function_decl_references method would be? (I am very new with libclang and Python.)

I have seen this where the def find_typerefs is finding all references to some type but I am not sure how to implement it for my needs.

Ideally, I would like to be able to fetch all references for any declaration; not only functions but variable declarations, parameter declarations (e.g. the a and b in the example above in line 7), class declarations etc.

EDIT Following Andrew's comment, here are some details regarding my setup specifications:

  • LLVM 3.8.0-win64
  • libclang-py3 3.8.1
  • Python3.5.1 (in Windows, I assume CPython)
  • For the args, I tried both the ones suggested in the answer here and the ones from another answer.

*Please note, given my small programming experience I could appreciate an answer with a brief explanation of how it works.

Community
  • 1
  • 1
  • Not for clang/python, but see http://stackoverflow.com/a/37149988/120163 for another variation on how to "find all references to a specific symbol/operator" – Ira Baxter May 12 '16 at 22:08
  • @IraBaxter Thank you for the link but I am interested solely on achieving this via `libclang`. –  May 12 '16 at 22:19

1 Answers1

7

The thing that really makes this problem challenging is the complexity of C++.

Consider what is callable in C++: functions, lambdas, the function call operator, member functions, template functions and member template functions. So in the case of just matching call expressions, you'd need to be able to disambiguate these cases.

Furthermore, libclang doesn't offer a perfect view of the clang AST (some nodes don't get exposed completely, particularly some nodes related to templates). Consequently, it's possible (even likely) that an arbitrary code fragment would contain some construct where libclangs view of the AST was insufficient to associate the call expression with a declaration.

However, if you're prepared to restrict yourself to a subset of the language it may be possible to make some headway - for example, the following sample tries to associate call sites with function declarations. It does this by doing a single pass over all the nodes in the AST matching function declarations with call expressions.

from clang.cindex import *

def is_function_call(funcdecl, c):
    """ Determine where a call-expression cursor refers to a particular function declaration
    """
    defn = c.get_definition()
    return (defn is not None) and (defn == funcdecl)

def fully_qualified(c):
    """ Retrieve a fully qualified function name (with namespaces)
    """
    res = c.spelling
    c = c.semantic_parent
    while c.kind != CursorKind.TRANSLATION_UNIT:
        res = c.spelling + '::' + res
        c = c.semantic_parent
    return res

def find_funcs_and_calls(tu):
    """ Retrieve lists of function declarations and call expressions in a translation unit
    """
    filename = tu.cursor.spelling
    calls = []
    funcs = []
    for c in tu.cursor.walk_preorder():
        if c.location.file is None:
            pass
        elif c.location.file.name != filename:
            pass
        elif c.kind == CursorKind.CALL_EXPR:
            calls.append(c)
        elif c.kind == CursorKind.FUNCTION_DECL:
            funcs.append(c)
    return funcs, calls

idx = Index.create()
args =  '-x c++ --std=c++11'.split()
tu = idx.parse('tmp.cpp', args=args)
funcs, calls = find_funcs_and_calls(tu)
for f in funcs:
    print(fully_qualified(f), f.location)
    for c in calls:
        if is_function_call(f, c):
            print('-', c)
    print()

To show how well this works, you need a slightly more challenging example to parse:

// tmp.cpp
#include <iostream>
using namespace std;

namespace impl {
    int addition(int x, int y) {
        return x + y;
    }

    void f() {
        addition(2, 3);
    }
}

int addition (int a, int b) {
  int r;
  r=a+b;
  return r;
}

int main () {
  int z, q;
  z = addition (5,3);
  q = addition (5,5);
  cout << "The first result is " << z;
  cout << "The second result is " << q;
}

And I get the output:

impl::addition
- <SourceLocation file 'tmp.cpp', line 10, column 9>

impl::f

addition
- <SourceLocation file 'tmp.cpp', line 22, column 7>
- <SourceLocation file 'tmp.cpp', line 23, column 7>

main

Scaling this up to consider more types of declarations would (IMO) be non-trivial and an interesting project in it's own right.

Addressing comments

Given that there are some questions about whether the code in this answer produces the results I've provided, I've added a gist of the code (that reproduces the content of this question) and a very minimal vagrant machine image that you can use to experiment with. Once the machine is booted you can clone the gist, and reproduce the answer with the commands:

git clone https://gist.github.com/AndrewWalker/daa2af23f34fe9a6acc2de579ec45535 find-func-decl-refs
cd find-func-decl-refs
export LD_LIBRARY_PATH=/usr/lib/llvm-3.8/lib/ && python3 main.py
Andrew Walker
  • 40,984
  • 8
  • 62
  • 84
  • 1
    Thank you so much for your detailed answer. Strangely, I am getting a different output compared to yours: `impl::addition - impl::f addition - - main `. Note the lack of any results for lines 22 and 23. Any ideas?? –  May 16 '16 at 15:44
  • Can you add a description of which version of clang and libclang you're using, the set of args and options you passed to parse and your target platform to the question. If you're on Windows, can you also add which Python distribution (cpython, anaconda) and version information type 'import sys; print(sys.version)' in the interpreter – Andrew Walker May 16 '16 at 21:02
  • 2
    LLVM 3.8.0-win64, libclang-py3 3.8.1, Python3.5.1 in Windows. No idea how to find out the Python distribution; I assume its CPython. For the args, I tried both the ones you suggest in your answer here and the ones from another of your [answers](http://stackoverflow.com/questions/37098725/parsing-with-libclang-unable-to-parse-certain-tokens-python-in-windows/37100397#37100397). –  May 16 '16 at 21:09
  • 2
    @AndrewWalker I also do not get the output you posted in your answer. Could you please explain the methods (mainly, `fully_qualified` and `is_function_call`) so I can better understand it? –  May 18 '16 at 13:32
  • 1
    @Karim - I'm having a really hard time reproducing the environment you described - 64 bit clang and 64 bit python on Windows don't seem to play nicely together - in the past, I've had some success if I used 32 bit versions of everything. So instead of you guessing about what the differences are, I've made an effort to supply a reproducible environment that you could use on windows (through vagrant / virtualbox) – Andrew Walker May 22 '16 at 04:41
  • @nk-fford - I've updated my answer and attempted to clarify what those functions were doing in the docstrings. – Andrew Walker May 22 '16 at 04:41
  • @AndrewWalker Your edit with the reproducible environment is much appreciated. I will give it a go as soon as possible. In the meantime, I only ask if you could provide a small explanation of the `is_function_call` and the `fully_qualified` methods, especially explaining the `while c.kind != CursorKind.TRANSLATION_UNIT` part. –  May 24 '16 at 17:17