3

I want to get access to the AST within cppyy before the python bindings are created. I'd like to use this to generate other kinds of bindings.

I have seen cppyy-generator, but it requires a separate installation of clang on the machine. Since cppyy can do JIT compilation without a separate installation of clang, I have to believe the AST is available from the underlying cling interpreter. Is there a way to get this AST info from cppyy?

example:

import cppyy

cppyy.cppdef("""
namespace foo
{
class Bar
{
public:
    void DoSomething() {}
};
}

""")

cppyy can (amazingly) generate cppyy.gbl.foo.Bar for me. That means it must have used Cling to compile, get an AST, and generate python. How can I see the AST data?

Thanks!

Edit:

I can see that much of the information I need is in the cppyy-backend capi and cpp_cppyy files. However, my CPython foo is not strong enough to figure out how these get called and how I might access them from a python script.

Edit2:

Currently we're using a combination of castxml and pygccxml to generate a python data structure representing the AST. I see a lot of overlap with what cppyy and wish to reduce dependencies to cppyy only, since we're already using it for other things and it is nicely self-contained.

We use the AST data for multiple things. An important one is code generation. So we'd like to to iterate the AST much like you can with pygccxml.

Ken
  • 693
  • 1
  • 7
  • 15

1 Answers1

4

There are a couple of ambiguities here b/c the same names apply to different steps and in different places. Let me explain the structure (and history), which may even answer your question.

cppyy-generator makes use of the Clang Python bindings. Thus, the AST it accesses is the C++ one, and it is available in its full (ugly) glory. You don't need any part of cppyy to use the Clang Python bindings. cppyy-generator serves a specific use case where you want all local C++ entities pre-loaded into a Python module. Since cppyy utilizes lazy everything and auto-loading, for performance reasons, the concept of "all C++ entities" (local or otherwise) does not have a well-defined meaning. Hence libclang was utilized, where the concept is clear.

The cppyy-backend capi (or C-API), is an API that was developed in reductio to serve the PyPy implementation of cppyy. It is a C-style API to bootstrap cppyy/C++. It is reduced to its essentials to write Python-C++ bindings, hiding many irrelevant details of the Clang AST (e.g. the 15 or so ways that a template can exist in the Clang AST are reduced to "IsTemplate" etc.). The backend C-API does not depend on, or use, Python in any way at all.

The implementation of the backend C-API is rather non-pretty. In part because of historic reasons (a bad thing), in part to hide all of Cling and thus Clang, to prevent clashes with other parts of an application that may be using Clang or LLVM (a good thing; the version of Clang in use by Cling is customized and may not work for e.g. Numba). Again, all this is completely independent of anything to do with Python.

Then, its use in Python. There are two different implementations: CPyCppyy for CPython, which is implemented in C, and the PyPy _cppyy module, which is implemented in RPython. Both perform the incantations to cross from Python into C++ through the C-API. Neither generates or uses the Python AST: both generate and manipulate Python entities directly. This happens lazily. Think the steps through: the Python user will, in your example above, type something like cppyy.gbl.foo.Bar().DoSomething(). In cppyy, Python's __getattr__ is used to intercept the names, and then it simply goes through the backend to Cling to ask whether it knows what is foo, Bar etc. For example, the C-API GetScope("foo") will return a valid identifier, so CPyCppyy/_cppyy knows to generate a Python class to represent the namespace. At no point, however, does it scan the global (or even foo) namespace in the AST in full to generate the bindings a priori. Based on your description, there is nothing in CPyCppyy/_cppyy that would be of use to you.

To come back to your first statement, that you want to generate other types of bindings. You don't state what type of bindings, but the main reason for going for the C-API would be that it is on top of Cling, rather than Clang as the Clang AST directly from C++ or through its Python bindings would be. Cling offers easy access to the JIT, but you could also program that directly from Clang (its libraries, not the AST). As an example of such easyu access, in the backend C-API, you can just dump a string of C++ to be JITted into the compile function (which does the exact same thing as cppdef in your example). There are plans by the Cling folk to provide a better interface for dynamic languages from Cling directly, but this is a work in progress and not (AFAIK) available yet.

Finally, do note that Cling contains Clang, so if you install Cling, you still get Clang (and LLVM), too, which can be a heavy dependency.

EDIT: Fundamentally it remains that contrary to those other tools, cppyy does not offer a list of starting points (e.g. "all classes"), nor the full/true AST. You can copy over the cpp_cppyy.h header from the backend (it is not otherwise part of the installation), simply include it, and use it (all symbols are exported already), but you need to know a priori the list of classes. Example:

import cppyy

cppyy.cppdef('#define RPY_EXPORTED extern')
cppyy.include('cpp_cppyy.h')

import cppyy

cppyy.cppdef("""
namespace foo {
class Bar {
public:
    void DoSomething() {}
};
}""")

cpp = cppyy.gbl
capi = cpp.Cppyy

scope_id = capi.GetScope(cpp.foo.Bar.__cpp_name__) # need to know existence
for i in range(capi.GetNumMethods(scope_id)):
     m = capi.GetMethod(scope_id, i)
     print(capi.GetMethodName(m))

But as you can see, it does not offer a one-to-one result with the original code. For example, all the compiler-generated constructors and the destructor are listed as methods.

There also isn't really anything in the backend API like run_functions = unittests.member_functions('run') as in the pygccxml documentation that you link. The reason is that such didn't make any sense in the context of cppyy. E.g. what if another header is loaded with more run functions? What if it is a templated function and more instantiations pop up? What if a using namespace ... appears in a later code, introducing more run overloads?

cppyy does have that GetAllCppNames C-API function, but it's not guaranteed to be exhaustive. It exists for the benefit of tab-completion in code editors (it is called in customized __dir__ functions of bound scopes). In fact, it is precisely because it wasn't complete that cppyy-generator uses libclang.

You mention gInterpreter in the comments, but that is part of the history that I mentioned earlier: it's an ill-fated intermediate between the full AST as offered by libclang, and the minimalistic one needed for Python (such as the backend C-API). Yes, you can use it directly (it is, in fact, still used underneath the backend C-API), but it's a lot more clunky for little benefit.

For example, to handle that "getting all 'run' methods" example, you could do:

import cppyy

cppyy.cppdef("""
namespace foo {
void run(int) {}
void run(double) {}
}""")

cpp = cppyy.gbl

# using the ROOT/meta interface
cls = cpp.CppyyLegacy.TClass.GetClass(cpp.foo.__cpp_name__)
print('num "run" overloads:"', cls.GetListOfMethodOverloads('run').GetSize())

# directly through gInterpreter
gInterp = cpp.gInterpreter

cls = gInterp.ClassInfo_Factory(cpp.foo.__cpp_name__)
v = cpp.std.vector['const void*']()
gInterp.GetFunctionOverloads(cls, 'run', v)
gInterp.ClassInfo_Delete(cls)

print('num "run" overloads:"', len(v))

But the former interface (through CppyyLegacy.TClass) may not stay around and the gInterpreter one is really ugly as you can see.

I'm pretty sure you're not going to be happy trying to make cppyy replace the use of pygccxml and if I were you, I'd use the Clang Python bindings insteads.

Wim Lavrijsen
  • 3,453
  • 1
  • 9
  • 21
  • 1
    Thanks for the history. cppyy contains cling which contains clang. I was surprised that to use cppyy-generator I had to separately import the clang package and install LLVM/clang to point to a libclang. I think you're saying that libclang lives somewhere in the guts of python because of the cppyy-backend installation, but that makes it even more confusing that cppyy-generator would not be able to find its own libclang dependency. – Ken Apr 17 '20 at 14:13
  • See Edit2 for motivations. I want to use the capi to get AST data from cling through cppyy. I can call various funcs on `cppyy.gbl.gInterpreter` but I'm struggling to figure out how to call to the capi in a python script. Can you give me an example of how I might get information similar to what cppyy-generator spits out for a cppyy class declaration, like my example? Thanks! – Ken Apr 17 '20 at 14:29
  • To your first comment: libclang was used precisely because because Clang is "too hidden" inside Cling and therefore almost impossible to use for other purposes. Note that `cppyy-generator` was contributed. I'd love to change it to use Cling's internal clang, but I haven't had the time to do so yet. I've updated the answer with some code examples, but I don't think using cppyy for the purpose of replacing pygccxml is going to make your life easy. – Wim Lavrijsen Apr 17 '20 at 19:35
  • Thanks again! Although Clang Python bindings sound like a good idea, it doesn't provide a clear benefit over our current pygccxml as far as the number of dependencies we bring in. I'll experiment with cppyy. In a lot of cases, we know the name of the type we want to interrogate, so it might be workable. – Ken Apr 20 '20 at 15:44
  • @WimLavrijsen I just started looking into cppyy and found this question while debugging an error message with it. Have you ever compared the running speed of "cling code" and "clang code"? Is it comparable, say, to unoptimized clang? – Sebastien Diot Jun 11 '20 at 19:02
  • 1
    Do you mean speed of generated (JITed) code? That's pretty much the same if used from cppyy (vanilla cling does pointer checking to prevent nullptr dereferencing which is both expensive and kills many optimizations). You can also set custom options. Or do you mean of compilation itself? There, Cling has explicitly removed certain optimization passes that are expensive in use yet give little run-time benefit. Cling also has support for pre-compiled modules these days, but cppyy isn't using that yet as it's only supported on Linux. Those speed up compilation by an order of magnitude. – Wim Lavrijsen Jun 11 '20 at 19:55
  • I meant, the speed of the JITed code, not the compile time itself. I could find no info about how fast cling JITed code ran, after being compiled. If it's comparable to Python rather than C++, I might as well just code everything in Python, and skip trying to optimize with some C++. But I take it from your answer that it must be just "a bit slower" than if compiled with clang. – Sebastien Diot Jun 11 '20 at 20:19
  • 1
    Now I'm not 100% sure anymore that we're talking about the same thing. To be sure: the JIT does not compile any Python code, only C++ code. And yes, that C++ code runs at native speeds. Putting C++ code into `cppyy.cppdef` and then calling it is like doing Numba by hand (if that comparison makes sense), with equal speedups. If you need to JIT the Python code, use PyPy (at which point, if you use cppyy in PyPy, there are _two_ JITs involved, although if library functions are available, cppyy will bypass the clang JIT and call those functions through the cffi_backend). – Wim Lavrijsen Jun 11 '20 at 22:03