7

I am building a project that builds multiple shared libraries and executable files. All the source files that are used to build these binaries are in a single /src directory. So it is not obvious to figure out which source files were used to build each of the binaries (there is many-to-many relation).

My goal is to write a script that would parse a set of C files for each binary and make sure that only the right functions are called from them.

One option seems to be to try to extract this information from Makefile. But this does not work well with generated files and headers (due to dependence on Includes).

Another option could be to simply browse call graphs, but this would get complicated, because a lot of functions are called by using function pointers.

Any other ideas?

user389238
  • 1,656
  • 3
  • 19
  • 40

5 Answers5

10

You can first compile your project with debug information (gcc -g) and use objdump to get which source files were included.

objdump -W <some_compiled_binary>

Dwarf format should contain the information you are looking for.

 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    < c>   DW_AT_producer    : (indirect string, offset: 0x5f): GNU C 4.4.3 
    <10>   DW_AT_language    : 1    (ANSI C)
    <11>   DW_AT_name        : (indirect string, offset: 0x28): test_3.c    
    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x36): /home/auselen/trials    
    <19>   DW_AT_low_pc      : 0x82f0   
    <1d>   DW_AT_high_pc     : 0x8408   
    <21>   DW_AT_stmt_list   : 0x0  

In this example, I've compiled object file from test_3, and it was located in .../trials directory. Then of course you need to write some script around this to collect related source file names.

auselen
  • 27,577
  • 7
  • 73
  • 114
2

Here is an idea, need to refine based on your specific build. Make a build, log it using script (for example script log.txt make clean all). The last (or one of the last) step should be the linking of object files. (Tip: look for cc -o <your_binary_name>). That line should link all .o files which should have corresponding .c files in your tree. Then grep those .c files for all the included header files.

If you have duplicate names in your .c files in your tree, then we'll need to look at the full path in the linker line or work from the Makefile.

What Mahmood suggests below should work too. If you have an image with symbols, strings <debug_image> | grep <full_path_of_src_directory> should give you a list of C files.

jman
  • 11,334
  • 5
  • 39
  • 61
  • Good idea too. But I am not familiar with the Makefiles for GCC but I do with VS and VS won't show that step in much details it will simple list the library files you are linking against. – Mahmoud Fayez Aug 29 '12 at 21:30
2

First you need to separate the debug symbols from the binary you just compiled. check this question on how to do so: How to generate gcc debug symbol outside the build target?

Then you can try to parse this file on your own. I know how to do so for Visual Studio but as you are using GCC I won't be able to help you further.

Community
  • 1
  • 1
Mahmoud Fayez
  • 3,398
  • 2
  • 19
  • 36
2

You can use unix nm tool. It shows all symbols that are defined in the object. So you need to:

  1. Run nm on your binary and grab all undefined symbols
  2. Run ldd on your binary to grab list of all its dynamic dependencies (.so files your binary is linked to)
  3. Run nm on each .so file youf found in step 2.

That will give you the full list of dynamic symbols that your binary use.

Example:

nm -C --dynamic /bin/ls
....skipping.....
00000000006186d0 A _edata
0000000000618c70 A _end
                 U _exit
0000000000410e34 T _fini
0000000000401d88 T _init
                 U _obstack_begin
                 U _obstack_newchunk
                 U _setjmp
                 U abort
                 U acl_extended_file
                 U bindtextdomain
                 U calloc
                 U clock_gettime
                 U closedir
                 U dcgettext
                 U dirfd

All those symbols with capital "U" are used by ls command.

Zaar Hai
  • 9,152
  • 8
  • 37
  • 45
1

If your goal is to analyze C source files, you can do that by customizing the GCC compiler. You could use MELT for that purpose (MELT is a high-level domain specific language to extend GCC) -adding your own analyzing passes coded in MELT inside GCC-, but you should first learn about GCC middle-end internal representations (Gimple, Tree, ...).

Customizing GCC takes several days of work (mostly because GCC internals are quite complex in the details).

Feel free to ask me more about MELT.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547