17

I wish to make a software which tracks the function calls of another software.
It should be like which function is being called from where "during run time".

Example:

int main ()
{
    a ();
    b ();
    c ();
    return 0;
}

a () 
{
   d ();
   e ();
}

b ()
{
   e ();
   f ();
}

Assuming I wish to write this currently in C for C, how should I keep a track of the calls at run time (starting from the first call) - with threads and without threads?

Hints?

Aquarius_Girl
  • 21,790
  • 65
  • 230
  • 411
  • 2
    This seems like a potentially platform-dependent question. Are you trying to do this on Linux, OSX, or Windows? – merlin2011 Mar 19 '13 at 08:23
  • 3
    How about [this](http://stackoverflow.com/questions/2281739/automatically-adding-enter-exit-function-logs-to-a-project)? – Alexey Frunze Mar 19 '13 at 08:24
  • @merlin2011 Well, "currently" on Linux. – Aquarius_Girl Mar 19 '13 at 08:24
  • 7
    Using GCC? [`-finstrument-functions`](http://webcache.googleusercontent.com/search?q=cache:GbOyQTwqv3kJ:gcc.gnu.org/onlinedocs/gcc-4.4.6/gcc/Code-Gen-Options.html#index-finstrument_002dfunctions-1914+&cd=1&hl=en&ct=clnk&gl=en) is your friend. –  Mar 19 '13 at 08:24
  • 1
    @H2CO3 that link isn't working. – Aquarius_Girl Mar 19 '13 at 08:25
  • @AnishaKaul Also, have a look at the source code of `ltrace`, it may give some hints, you might even be able to modify it so that it prints out every function call, not only those in the libraries. –  Mar 19 '13 at 08:29
  • @H2CO3 Well, thanks for the links. I'd appreciate if someone would hint at how to "implement" this feature. Studying the source code would be easy if I have an idea about what goes beneath the hood. – Aquarius_Girl Mar 19 '13 at 08:31
  • Why not use existing tools like `valgrind`? It can tell you which function is being called from where (with the Callgrind tool). You can also write your own tool, taking Callgrind as an example. – n. m. could be an AI Mar 19 '13 at 08:59

5 Answers5

8

You can do this before runtime by static analysis, or at runtime by dynamic analysis.

You have only two ways to do this at runtime, and they both amount to instrumenting code:

Instrument the source code

Modify ("instrument") the source code so it keeps track of what you want, e.g., modify every call site to pass an additional argument effectively containing the calling function name, and every function entry to record the entry to that function (e.g., effectively the called function name) and the calling function name. The instrumentation process needs to build a cross-reference from points that are instrumented, back to source file and line to enable later reporting.

You can do this in a completely automated way by using a Program Transformation System (PTS), which can parse source code to make ASTs, modify the ASTs, and then regenerate source. Often a PTS will let you write the changes as source-level patterns of the form, "if you see this, replace it by that". (No, regexes can't do this).

An extended example of how do this can be found in my technical paper, Branch Coverage for Arbitrary Languages. The paper shows discusses what is in a PTS, and then shows typical instrumentation transformations.

Getting this done "right" for a language like C can be difficult, because you need as a foundation, a full C parser, which is a major technical feat in its own right. Best to get such a beast in completed form (some of the PTS have this, consider Concinnelle or DMS) or you'll never get around to doing the instrumentation part. (Another answer suggests that GCC has much of this instrumentation functionality built-in).

Done reasonably, the execution cost of the additional instrumentation is pretty modest (10-30%). This concept has been used to implement test coverage tools and timing profilers, which operate by tracking the dynamic set of calls, which seems to be exactly what you want.

Instrument the object code

Modify the execution engine that runs the code to keep track of the details you want. Given that you want to do this with C programs, the execution engine is essentially the underlying instruction set.

Since you can't reasonably modify the processor chip itself, you'll have to either:

  • modify the object code (using object-code equivalents of program transformations, see preceding paragraph for concepts, or see tools like PIN and Valgrind for details of implementation),

  • or write some kind of clever instruction set interpreter that collects the information it needs as it encounters call-instructions (this is likely to be rather slow, because of its interpretive nature).

Implementing these for real instruction sets is complicated by the instruction sets themselves (Intel's X86 instruction set is huge), object file formats, etc. Don't expect this path to be easier than the source instrumentation path; just expect it to have a different set of problems.

One standard problem is knowing what to instrument; the object code space is full of code (libraries, system routines, ...) that aren't part of the application of interest, and thus aren't interesting to collect test coverage data for. For this, you need a symbol table ( pairs) to tell what is application code, and what is not. You also need to be able to trace from the name to a point in the original source file(s), in order to report the results.

One other difficulty with object code instrumentation is that it is difficult to tell if a function foo, which has been inlined, has been executed ("covered"). The source instrumentation scheme does not have this problem, because foo gets instrumented before it gets compiled/inlined, so the instrumentation gets inlined too.

Threads

Handling threads is an orthogonal issue. At each place where you record a fact X-calls-Y, you simply attach the thread ID (or its equivalent) T, obtained from the OS, to produce X-calls-Y-in-T. It isn't clear from your question why you want to do this.

Static Analysis

You could decide to skip all the hassles of runtime implementation, and build/get a static analyzer that can produce a call graph. Getting of these for C might be possible using Clang or DMS. Building it yourself is likely difficult; now you need a full C parser, complete data flow and points-to analysis, and call-graph construction. The details won't fit in this paragraph.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
7

This is very much platform dependent. What you need is to do more or less what a debugger does.

The way I'd approach this (I actually implemented all this once when I had to debug tools needed to build gdb on an new architecture):

Read the symbol table of the program you want to debug. You said Linux in a comment, so to start you need a library that reads ELF files or read the ELF specification and implement something yourself.

Use the ptrace syscall to create a breakpoint at the function main. If you're lucky, your system has a ptrace facility to create breakpoints and keeps the bookkeeping in the kernel. If it doesn't, you need to figure out the breakpoint instruction for your cpu architecture and implement breakpoints yourself.

Next, you need to figure out the debugging hooks in your dynamic loader so that you know when shared libraries are loaded. You'll need the symbol tables of the libraries as well.

Now that you have all the symbols (admittedly this is after your program has run for a while, since it had to load the dynamic libraries, but we'll pretend that the program starts at main) create breakpoints at all functions you've gotten from the symbol table.

Let the program run. Every time you hit a breakpoint, reverse look up the instruction pointer in the symbol table. Save the name of the function to a file or wherever you want to save it. You can now track function calls.

Or just use a debugger. gdb can probably be scripted to do something like this.

Art
  • 19,807
  • 1
  • 34
  • 60
4

I apologize that these are not open source tools, but I have used them for research in the past and they enable the functionality you require, so you might get some hints from playing with them.

On Linux, try taking a look at how Pin works.

On Windows, look at Detours.

merlin2011
  • 71,677
  • 44
  • 195
  • 329
3

I would use the instrumentation functionality built into GCC to do the source-code instrumentation for you.

Here is a link for a tutorial: http://balau82.wordpress.com/2010/10/06/trace-and-profile-function-calls-with-gcc/

Basically you tell the program to call an instrumentation-function before each function call and give two addresses as parameters. You provide that instrumentation-function yourself, and it gets called with an from-address and a to-address. The from-address tells you where the function at the to-address is being called from.

You then need to translate the addresses into function names (which can be done given line-numbers in a source-code file). By using the GNU binutils tool addr2line you can convert the to- and from-address to line numbers, given the binary file was compiled with debug information (gcc -g ...).

It is very fast to get started with instrumentation using this approach.

EDIT:

If you do not have the source and only the binaries at hand, you can try disassembling or parsing the binaries into assembly. This way you can separate the program into logic blocks which are jumped to/from. Function names cannot easily be recovered without the source-code, so you can risk to only be able to track jumps within the logic blocks you've previously divided the program into. This is what several debuggers do AFAIK. I think OllyDbg (Windows only) and IDA pro can give a visual flow chart sorta, or show a colored block-diagram or something like that. You're really having to work hard without the source, whereas if you have the source code at hand, you have all means of doing exactly what you want. Adding just one step in between write code -> compile -> execute so it becomes write code -> insert instrumentation -> compile -> execute.

Morten Jensen
  • 5,818
  • 3
  • 43
  • 55
1

The laborious way is inserting a printf as the first and last line of a function. Or using a debugger. A problem could be debuggers who set all variables to 0 / null, "solving" bugs in the code.

My solution was using ctrace

Morten Jensen
  • 5,818
  • 3
  • 43
  • 55
Walter A
  • 19,067
  • 2
  • 23
  • 43