7

I need to go through a C/C++ file and extract the list of classes and methods and where they're located on the file.

Is libclang the best option? Or is it "too much" for the task?

Would it be better to just look for pairing brackets?

In case libclang is the choice: is there a way to invoke it from c#?

Thanks!

pablo
  • 6,392
  • 4
  • 42
  • 62
  • Of course there is always a way to invoke C++ code from .NET, using C++/CLI. A more portable solution would be to provide a plain C wrapper around libclang (should be fairly straightforward). But the best possible solution would be to go for an older version of Clang, back when its nice XML printer still existed. It is really sad that it was removed from Clang. – SK-logic Jan 11 '12 at 11:51
  • Another option (a bit rusty, but still works for most of the cases) is the Elsa parser combined with the gcc preprocessor. – SK-logic Jan 11 '12 at 11:53
  • You aren't clear on how precise an answer you want. You could build a solution based on pairing brackets that would likely produce simple method/class information 90% of the time, with spectacular errors the other 10%. What do you intend to do with the results? – Ira Baxter Jan 11 '12 at 12:00

6 Answers6

6

You could consider ctags, available on many platforms. The output is easily parsable, and full of info you required.

more info For your question, I had to look to the many options available, and after a little I found it. For example:

ctags -N -x --c-kinds=+p crowd.*

produces this output

CrowdSim         class        44 crowd.h          class CrowdSim
CrowdSim         function     47 crowd.h          CrowdSim( const std::string& contentDir ) : _contentDir( contentDir ) {}
Particle         function     35 crowd.h          Particle()
Particle         struct       25 crowd.h          struct Particle
_contentDir      member       56 crowd.h          std::string _contentDir;
_crowd_H_        macro        18 crowd.h          #define _crowd_H_
_particles       member       57 crowd.h          std::vector< Particle > _particles;
animTime         member       32 crowd.h          float animTime;
chooseDestination function     24 crowd.cpp        void CrowdSim::chooseDestination( Particle &p )
chooseDestination prototype    53 crowd.h          void chooseDestination( Particle &p );
dx               member       28 crowd.h          float dx, dz; // Destination position
dz               member       28 crowd.h          float dx, dz; // Destination position
fx               member       29 crowd.h          float fx, fz; // Force on particle
fz               member       29 crowd.h          float fx, fz; // Force on particle
init             function     35 crowd.cpp        void CrowdSim::init()
init             prototype    49 crowd.h          void init();
node             member       31 crowd.h          H3DNode node;
ox               member       30 crowd.h          float ox, oz; // Orientation vector
oz               member       30 crowd.h          float ox, oz; // Orientation vector
px               member       27 crowd.h          float px, pz; // Current postition
pz               member       27 crowd.h          float px, pz; // Current postition
update           function     68 crowd.cpp        void CrowdSim::update( float fps )
update           prototype    50 crowd.h          void update( float fps );

(note: -x is only for easy user inspection)

CapelliC
  • 59,646
  • 5
  • 47
  • 90
  • Looks like this is probably the best option. Is it able to tell you where a method AND its body is? Or only where the method declaration is? – pablo Jan 11 '12 at 12:09
  • for instance, for function CrowdSim::update, is it possible to know where its body is? – pablo Jan 11 '12 at 15:35
4

To do this well, you really need something that contains a full C++ parser.

Our DMS Software Reengineering Toolkit with its C++ Front End could be used for this. It can provide both the precise entity declarations including types, and their context (class/namespace/...) and precise file positions. DMS provides access to all this inforamtion as a set of ASTs and related symbol tables; you build custom code to navigate to/take what you want.

Depending on your needs, you may find that the information you want is difficult to process using vanilla C#. The type information in its full glory is pretty complex, because C++ is a complex language. If you want to process that information, you'll want to "stay inside" DMS where all the machinery to do that is present. If all you want is the names and type information as text strings, you can get DMS to prettyprint this data in that form; it has standard libraries supporting such activities. An intermediate answer would be to export the data in XML format; DMS provides direct support for exporting arbitrary AST fragments but only indirect support for writing type information out as XML, but it wouldn't be hard to customize.

EDIT: (in response to OP comment in another answer) DMS can provide precise information both about the method signature, and the method body. It has full AST and type information for both.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
1

If you want to use Clang, I recommend you take a look at this page. It demonstrates how to get all virtual methods from a file. Once you understand this simple example, you can create more complex so called matchers.

Community
  • 1
  • 1
Konrad Kleine
  • 4,275
  • 3
  • 27
  • 35
1

Not sure what is the best option, but you could take a look at GCC-XML or Mono/CXXI as well. The latter one uses GCC-XML internally, but also provides C# interfaces to the C++ classes definitions.

libclang is a C library and thus should be usable from .NET via P/Invoke, but it might be quite tedious to repeat all necessary declarations in C#.

Konstantin Oznobihin
  • 5,234
  • 24
  • 31
1

Another angle would be to create an extension for Visual Studio.

justin
  • 104,054
  • 14
  • 179
  • 226
  • It is certainly an option, specially with the new stuff on the latest version, but I'd like to have a VS-independent solution. – pablo Jan 11 '12 at 11:39
  • ah, ok. well, i'll leave it in, in the event somebody finds it useful for their needs. – justin Jan 11 '12 at 11:44
0

It's better to use a full parser IMO. You can use ANTLR. It has both C/C++ grammar and C# parser generator.

Osman Turan
  • 1,361
  • 9
  • 12
  • 1
    ANTLR grammar for C/C++ is not as good as libclang AFAIK. libclang does the job for you, you've to decorate the entire C/C++ grammar if you want to go the ANTLR way – pablo Jan 11 '12 at 11:35
  • 1
    A "full parser" is not nearly enough, when there is a preprocessor and a complicated platform-specific include paths configuration present. – SK-logic Jan 11 '12 at 11:52
  • 1
    ... and I don't think the ANTLR grammar is actually complete. The release notes contain author information that says he didn't really finish it. – Ira Baxter Jan 11 '12 at 12:16