2

This question is related to a question I've asked earlier this day: I wonder if it's possible to generate a caller graph from a given function (or symbol name e.g. taken from nm), even if the function of interest is not part of "my" source code (e.g. located in a library, e.g. malloc())

For example to know where malloc is being used in my program named foo I would first lookup the symbol name:

nm foo | grep malloc
         U malloc@@GLIBC_2.2.5

And then run a tool (which might need a specially compiled/linked version of my program or some compiler artifacts):

 find_usages foo-with-debug-symbols "malloc@@GLIBC_2.2.5"

Which would generate a (textual) caller graph I can then process further.

Reading this question I found radare2 which seems to accomplish nearly everything you can imagine but somehow I didn't manage to generate a caller graph from a given symbol yet..

Progress

Using radare2 I've managed to generate a dot caller graph from an executable, but something is still missing. I'm compiling the following C++ program which I'm quite sure has to use malloc() or new:

#include <string>

int main() {
  auto s = std::string("hello");
  s += " welt";
  return 0;
}

I compile it with static libraries in order to be sure all calls I want to analyze can be found in the binary:

g++ foo.cpp -static

By running nm a.out | grep -E "_Znwm|_Znam|_Znwj|_Znaj|_ZdlPv|_ZdaPv|malloc|free" you can see a lot of symbols which are used for memory allocation.

Now I run radare2 on the executable:

r2 -qAc 'agCd' a.out > callgraph.dot

With a little script (inspired by this answer) I'm looking for a call-path from any symbol containing "sym.operatornew" but there seems to be none!

Is there a way to make sure all information needed to generate a call graph from/to any function which get's called inside that binary?

Is there a better way to run radare2? It looks like the different call graph visualization types provide different information - e.g. the ascii art generator does provide names for symbols not provided by the dot generator while the dot generator provides much more details regarding calls.

frans
  • 8,868
  • 11
  • 58
  • 132

1 Answers1

2

In general, you cannot extract an exact control flow graph from a binary, because of indirect jumps and calls there. A machine code indirect call is jumping into the content of some register, and you cannot reliably estimate all the values that register could take (doing so could be proven equivalent to the halting problem).

Is there a way to make sure all information needed to generate a call graph from/to any function which get's called inside that binary?

No, and that problem is equivalent to the halting problem, so there would be never a sure way to get that call graph (in a complete and sound way).

The C++ compiler would (usually) generate indirect jumps for virtual function calls (they jump thru the vtable) and probably when using a shared library (read Drepper's How To Write Shared Libraries paper for more).

Look into the BINSEC tool (developed by colleagues from CEA, LIST and by INRIA), at least to find references.

If you really want to find most (but not all) dynamic memory allocations in your C++ source code, you might use static source code analysis (like Frama-C or Frama-Clang) and other tools, but they are not a silver bullet.

Remember that allocating functions like malloc or operator new could be put in function pointer locations (and your C++ code might have some allocator deeply buried somewhere, then you are likely to have indirect calls to malloc)

Maybe you could spend months of effort in writing your own GCC plugin to look for calls to malloc after optimizations inside the GCC compiler (but notice that GCC plugins are tied to one particular version of GCC). I am not sure it is worth the effort. My old (obsolete, non maintained) GCC MELT project was able to find calls to malloc with a size above some given constant. Perhaps in at least a year -end of 2019 or later- my successor project (bismon, funded by CHARIOT H2020 project) might be mature enough to help you.

Remember also that GCC is capable of quite fancy optimizations related to malloc. Try to compile the following C code

//file mallfree.c
#include <stdlib.h>
int weirdsum(int x, int y) {
  int*ar2 = malloc(2*sizeof(int));
  ar2[0] = x; ar2[1] = y;
  int r = ar2[0] + ar2[1];
  free (ar2);
  return r;
}

with gcc -S -fverbose-asm -O3 mallfree.c. You'll see that the generated mallfree.s assembler file contain no call to malloc or to free. Such an optimization is permitted by the As-if rule, and is practically useful to optimize most usages of C++ standard containers.

So what you want is not simple even for apparently "simple" C++ code (and is impossible in the general case).

If you want to code a GCC plugin and have more than a full year to spend on that issue (or could pay at least 500k€ for that), please contact me. See also https://xkcd.com/1425/ (your question is a virtually impossible one).

BTW, of course, what you really care about is dynamic memory allocation in optimized code (you really want inlining and dead code elimination, and GCC does that quite well with -O3 or -O2). When GCC is not optimizing at all (e.g. with -O0 which is the implicit optimization) it would do a lot of "useless" dynamic memory allocation, specially with C++ code (using the C++ standard library). See also CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” talk.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • I don't need a control flow graph but just a tree. My primary goal is to identify functions which *might* call a certain function. This is an easy task - there are many call graph generators out there - but I need to generate a call graph down to standard library functions. (e.g. `new`, `delete`, `malloc()`, `free()` etc.) – frans Oct 17 '18 at 15:40
  • This is *not* an easy task. A function pointer location might *sometimes* contain the address of `malloc` -or of some `operator new` function inside a C++ class. – Basile Starynkevitch Oct 17 '18 at 15:40
  • You're right of course - but leave aside function pointers for now. In the example snippet I provided I create and modify a `std::string` and for now I'm just looking for a tool, which would (implicitly) tell me that somewhere inside `main()` I might call `new` or `malloc()` – frans Oct 17 '18 at 15:47
  • My old GCC MELT could be such a tool, but it is a dead project, is related to old unmaintained versions of GCC. I am working on its successor [bismon](http://github.com/bstarynk/bismon), but you need to wait a year before I could offer some usable tool. Today (october 2018) my `bismon` is unusable, still pre-alpha – Basile Starynkevitch Oct 17 '18 at 15:48
  • In your answer you provide scenarios in which it's hard/impossible to spot a certain function call due to optimization. But in my case the source code is completely available and for analyzing purposes I can compile it with all options needed for that (`-O0`, `-static`, `-fno-function-cse`, `-fomit-frame-pointer`, etc). So I could also use a static source code analyzer but I don't know any tool which would *also consider the standard libraries*. – frans Oct 17 '18 at 16:03
  • Then code your own GCC plugin to do that (you'll need a year of work). Notice that quite often `operator new` could be inlined or specialized inside the standard C++ library. You'll find out that your problem is really hard (and much harder than what it looks at first). I am working in a lab which has several experts on that question. – Basile Starynkevitch Oct 17 '18 at 16:04