1

I struggling for a while now to find a decent way to bind a field in C++ structure to its comments, using libclang 3.9.1, and python 3.5.2.

So far, I got this setup up and running: Assuming that I have file Foo.h:

typedef int arbType;

struct Foo {
    //First bar comment
    //Second bar comment
    int Bar; //Third bar comment - after bar

    /* First line baz comment - before baz
       Second line baz comment - before baz
    */
    arbType Baz; //Third line baz comment - after baz
};

My python code extract only the in-line comments:

#bind_comments.py
import clang.cindex

def get_cur_comments(cursor):
    comment = ''
    print ('\nGetting comment for:', cursor.spelling.decode())
    parent_cur = cursor.lexical_parent
    token_iter = parent_cur.get_tokens()
    for token in token_iter:
        if token.cursor == cursor:            
            while token.kind.name != 'PUNCTUATION':
                token = next(token_iter)
            token = next(token_iter)
            if token.kind.name == 'COMMENT':
                comment = token.spelling.decode().strip('/')
    return comment

def main():
    index = clang.cindex.Index.create()
    tu = index.parse(b'Foo.h', [b'-x', b'c++'])
    tu_iter = tu.cursor.get_children()
    next(tu_iter)
    root_cursor = next(tu_iter)

    for cur in root_cursor.type.get_fields():
        print(get_cur_comments(cur))

if __name__ == '__main__':
    main()

And the output:

C:\>bind_comments.py

Getting comment for: Bar
'Third bar comment - after bar'

Getting comment for: Baz
'Third line baz comment - after baz'

Now, for my problems, ordered by importance, in descending level:

  1. How can I bind the comments before the fields? I looked at many 'peeking' solutions in python, in order to find out while I'm iterating tokens if the next one is the cursor(field) that I'm interested in, but found nothing that I can implement properly in my case. Just to show you how serious I am, here is a few of the solutions I looked at:

  2. Conceptual flaw: I don't know yet how to tell the difference between:

    struct Foo {
       int Bar; // This comment belong to bar
                // As well as this one
    
       // While this comment belong to baz already
       int Baz;
     };
    
  3. Performance issues: please note that for each field, I'm iterating through the whole tokens list of it structure. If it is a big one, and I have a lot of tokens - I guess that will cost me. I'd like to find some shortcuts.. I thought about saving the tokens in global list, but then what if the field is declaration of another struct/class? Add their parent's tokens to the list? This is starting to get messy...

Just helpers for those who don't know libclang yet:

>>> print(root_cursor.spelling.decode())
Foo
>>> root_cursor.type.get_fields()
<list_iterator object at 0x0177B770>
>>> list(root_cursor.type.get_fields())
[<clang.cindex.Cursor object at 0x0173B940>, <clang.cindex.Cursor object at 0x017443A0>]
>>> for cur in root_cursor.type.get_fields():
...   print (cur.spelling.decode())
...
Bar
Baz
>>> root_cursor.get_tokens()
<generator object TokenGroup.get_tokens at 0x01771180>
Community
  • 1
  • 1
Bak Itzik
  • 466
  • 1
  • 5
  • 17

1 Answers1

2

libclang provides direct support for extracting javadoc style comments using the Cursor properties brief_comment and raw_comment

With a little tweaking of your input code:

s = '''
typedef int arbType;

struct Foo {
    /// Brief comment about bar
    ///
    /// Extra Text about bar
    int Bar; 

    /** Brief comment about baz
     *
     * Extra Text about baz
     */
    arbType Baz; 

    /// Brief only comment
    int blah;
};
'''

import clang.cindex
from clang.cindex import CursorKind

idx = clang.cindex.Index.create()
tu = idx.parse('tmp.cpp', args=['-std=c++11'],  unsaved_files=[('tmp.cpp', s)],  options=0)
for c in tu.cursor.walk_preorder():
    if c.kind == CursorKind.FIELD_DECL:
        print c.brief_comment
        print c.raw_comment
        print 

Produces:

Brief comment about bar
/// Brief comment about bar
    ///
    /// Extra Text about bar

Brief comment about baz
/** Brief comment about baz
     *
     * Extra Text about baz
     */

Brief only comment
/// Brief only comment
Andrew Walker
  • 40,984
  • 8
  • 62
  • 84
  • Damn, forgot to mention it :)... I already knew about `Cursor.brief_comment` and `Cursor.raw_comment`, but this requires changing the .h file. Let's assume I can't change it, and it is in the format that appears in my question. How can I deal with it now? (And, BTW, clang assume that all the relevant comments to the cursor appears before it. What about in-liners, like in my `.h` file?) Thanks anyway for your time... – Bak Itzik Feb 01 '17 at 12:56