4

I want to instrument my code directly by pre-processing source files with sed/awk. I cannot use other methods like debugger traces or gcc option -finstrument-functions. In this last case, the addresses are rebased in some way I cannot manage and I miss the correspondence with the symbol table. Other methods presented here (ptrace, etrace, callgraph, etc) or here work well on a simple example, but not in my real project.

The problem is that when processing big open source projects, the writing standards of functions differ, not only between C and C++ files, but often in the same file. The { may be at the end of the argument list, or on the other line, structures or assignment may use a starting {, making simple function parsing false.

So the solution presented in the above links that insert a macro in the beginning of the function definition does not work in general, and it is not feasible to correct by hand kilo lines of code (KLOC).

sed 's/^{/{ENTRY/'

So, how to target reliably functions definitions in C/C++ code with regular expressions usable in sed or awk? Possibly by using a part of the gcc precompiler code? I am looking for something possibly off-the -shelf please.

Community
  • 1
  • 1
lalebarde
  • 1,684
  • 1
  • 21
  • 36

2 Answers2

8

sed or awk (or any purely textual approach) are the wrong tools to process reliably C code (and you probably should work on the pre-processed form).

You want to work on some form of the compiler's AST. Of course the internal representations inside a compiler are specific to the compiler (and perhaps even to its version).

If using a recent GCC you could customize it using MELT (and add your passes to GCC) -or with your own plugin in C++.

If using Clang/LLVM you could also customize it by adding your passes.

The Coccinelle tool might also be relevant.

Any such approach requires a significant amount of work (probably weeks) since you'll need to understand in detail the internal representations of the particular compiler you are using. And C is complex enough to make that non-trivial.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 2
    [LibClang](http://clang.llvm.org/docs/Tooling.html) and Python bindings might also be an option as a preprocessor - use the parsing functionality of Clang, modify the AST as required, then output C++ source corresponding to the modified AST again, and finally run normal Clang on this source file. – Angew is no longer proud of SO Nov 04 '14 at 10:06
  • MELT looks great and a top solution, but the learning curve is to high for my immediate need. – lalebarde Nov 04 '14 at 10:18
  • But *all* the approaches require you to understand the internals of your compiler (and its ASTs). This is the major difficulty. Hence the "weeks of effort". – Basile Starynkevitch Nov 04 '14 at 10:18
2

You cannot do this with any tool that does not understand the specific version of C your code is written in (e.g. C++ or ANSI-C or C-99). As a trivial example - what does "//" mean in a "C function"? Well if it's inside a string it's a literal pair of slashes, and if it's outside of a string it might be the start of a comment if the code is C++ or C-99 but its not the start of a comment in ANSI-C. What if it's inside /* ... // ... */? If what looks like a function definition follows a "//" is that really a function?

You don't say what it is you want to do ("pre-process the code" doesn't tell us anything) but you should look into using something like I posted at Remove multi-line comments to use gcc to strip code of comments and then a C beautifier like "indent" or "cb" to reformat the code consistently and/or take a look at "cscope" or "ccalls" if you're just looking for a tool to list functions.

Community
  • 1
  • 1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • In the beginning of my post, I talk of traces with trial with -finstrument-functions and other methods, with links. That means I want to add usefull information traces in the code, not strip it. – lalebarde Nov 04 '14 at 15:12
  • 1
    It doesn't mean that to me. I've no idea what `-finstrument-functions` means or why you were talking about debugger traces. Are you saying you want to add print statements to your code when functions are entered/exited or something else? If so, what do you want those print statements to output? You are missing the point of my post - I'm not telling you how to strip the code, I'm telling you how to re-format it in a consistent way so that you CAN write/use a tool to find function starts, etc. – Ed Morton Nov 04 '14 at 16:46