I struggling for a while now to find a decent way to bind a field in C++ structure to its comments, using libclang 3.9.1, and python 3.5.2.
So far, I got this setup up and running:
Assuming that I have file Foo.h
:
typedef int arbType;
struct Foo {
//First bar comment
//Second bar comment
int Bar; //Third bar comment - after bar
/* First line baz comment - before baz
Second line baz comment - before baz
*/
arbType Baz; //Third line baz comment - after baz
};
My python code extract only the in-line comments:
#bind_comments.py
import clang.cindex
def get_cur_comments(cursor):
comment = ''
print ('\nGetting comment for:', cursor.spelling.decode())
parent_cur = cursor.lexical_parent
token_iter = parent_cur.get_tokens()
for token in token_iter:
if token.cursor == cursor:
while token.kind.name != 'PUNCTUATION':
token = next(token_iter)
token = next(token_iter)
if token.kind.name == 'COMMENT':
comment = token.spelling.decode().strip('/')
return comment
def main():
index = clang.cindex.Index.create()
tu = index.parse(b'Foo.h', [b'-x', b'c++'])
tu_iter = tu.cursor.get_children()
next(tu_iter)
root_cursor = next(tu_iter)
for cur in root_cursor.type.get_fields():
print(get_cur_comments(cur))
if __name__ == '__main__':
main()
And the output:
C:\>bind_comments.py
Getting comment for: Bar
'Third bar comment - after bar'
Getting comment for: Baz
'Third line baz comment - after baz'
Now, for my problems, ordered by importance, in descending level:
How can I bind the comments before the fields? I looked at many 'peeking' solutions in python, in order to find out while I'm iterating tokens if the next one is the cursor(field) that I'm interested in, but found nothing that I can implement properly in my case. Just to show you how serious I am, here is a few of the solutions I looked at:
- SO Q: how-to-look-ahead-one-element-in-a-python-generator
- Code Recipe: look-ahead-one-item-during-iteration
- Just another Code Recipe: peek-ahead-an-iterator
Conceptual flaw: I don't know yet how to tell the difference between:
struct Foo { int Bar; // This comment belong to bar // As well as this one // While this comment belong to baz already int Baz; };
- Performance issues: please note that for each field, I'm iterating through the whole tokens list of it structure. If it is a big one, and I have a lot of tokens - I guess that will cost me. I'd like to find some shortcuts.. I thought about saving the tokens in global list, but then what if the field is declaration of another struct/class? Add their parent's tokens to the list? This is starting to get messy...
Just helpers for those who don't know libclang yet:
>>> print(root_cursor.spelling.decode())
Foo
>>> root_cursor.type.get_fields()
<list_iterator object at 0x0177B770>
>>> list(root_cursor.type.get_fields())
[<clang.cindex.Cursor object at 0x0173B940>, <clang.cindex.Cursor object at 0x017443A0>]
>>> for cur in root_cursor.type.get_fields():
... print (cur.spelling.decode())
...
Bar
Baz
>>> root_cursor.get_tokens()
<generator object TokenGroup.get_tokens at 0x01771180>