3

It is easy to get the starting address of a function in C, but not its size. So I am currently doing an "nm" over the object file in order to locate my function and THEN locate the starting address of the next function. I need to do the "nm" because compiler could (and actually do, in my case) reorder functions, so source order can be different of object order.

I wonder if there are other ways of doing this. For example, instructing the compiler to preserve source code order in the object file, etc. Maybe some ELF magic?

My compilers are GCC, CLANG and Sun Studio. Platform: Solaris and derivatives, MacOSX, FreeBSD. To expand in the future.

jcea
  • 648
  • 6
  • 13
  • What do you mean by "size"? Length and size are different terms. – syb0rg Mar 20 '14 at 01:46
  • Example. UGLY!! - http://hg.jcea.es/cpython-2011/file/0de6441eedb7/Include/pydtrace_offsets.sh – jcea Mar 20 '14 at 01:49
  • I need to know the starting address and the end address of C functions. – jcea Mar 20 '14 at 01:49
  • 1
    You should also consider inter-function padding bytes (if your compiler is aligning functions). The only bullet-proof way that I can think of to do this, is to put each function in a separate section, and use a linker script to define a symbol immediately following the function (from which you can subtract). (Also UGLY) – Jonathon Reinhart Mar 20 '14 at 01:50
  • Padding is not an issue. I need a range [start address, end address] where function X is start address < X < end address and no other function/symbol/variable is in that range. Dummy padding is OK. – jcea Mar 20 '14 at 01:53
  • 5
    Could you tell us _why_ you need to know this? I suspect you're asking a strange question because you're trying to solve the wrong problem. – keshlam Mar 20 '14 at 01:58
  • For the sake of the argument, lets say I am doing a sampling profiler where all the offsets must be known and static before launching the profiler. Details: stack helper for DTrace: http://www.jcea.es/artic/python_dtrace.htm - http://hg.jcea.es/cpython-2011/file/0de6441eedb7/Include/pydtrace.d - This is run at kernel level, so I must know the functions coverage beforehand, can not do it at runtime. Also, postprocessing the profiler dump is **not** an option. – jcea Mar 20 '14 at 02:02
  • Problem is, after an optimizing compiler is done with the code, functions may be inlined or tail-calling via goto rather than call/return or otherwise doing obsene things, so you can't always count on being able to extract this from the code. The best place to get it is probably to ask the compiler to give you a symbol-table dump, and even then you may lose some information unless you crank the optimization back and "compile for debugging". – keshlam Mar 20 '14 at 02:12
  • 1
    The symbol table dump is *more or less* what I am already doing when using **nm**. I know about inter-function optimizations. In this particular case that is not a problem, because I am interested on a SINGLE function called from other object file. No link-time inlining either. This not need to be perfect, just better than current situation. Not everything is allowed, though. I can't, for instance, segregate that function to its own objectfile because other core developers would yell (rightly so) about obscuring sourcecode for no gain (for them). – jcea Mar 20 '14 at 02:20
  • Commands like "readelf", "dumpobj" and "objdump" looks useful, but not really better that "nm". and they are not available everywhere. – jcea Mar 20 '14 at 02:26
  • @jcea `nm` gives you the offsets. The question asked was why do you need to know the lengths? If a PC value is between two offsets, it belongs to the function that started at the first offset. – user207421 Mar 20 '14 at 02:29
  • Given NO interfunction optimizations you are right, and it is what I am doing now. But it looks really ugly and I hoped there are better ways. Example of UGLYNESS: http://hg.jcea.es/cpython-2011/file/0de6441eedb7/Include/pydtrace_offsets.sh – jcea Mar 20 '14 at 02:32
  • related: http://stackoverflow.com/questions/22244683/how-to-get-an-association-between-gcc-compiled-c-functions-and-code-size – box Mar 20 '14 at 05:05
  • possible duplicate of [How to get the size of a C function?](http://stackoverflow.com/questions/11410037/how-to-get-the-size-of-a-c-function) – Jeffrey Bosboom Jun 30 '14 at 17:51

2 Answers2

4

I have found that the output of objdump -t xxx will give definitive function size/length values for program and object files (.o).

For example: (From one of my projects)

objdump -t emma | grep " F .text"

0000000000401674 l F .text 0000000000000376 parse_program_header
00000000004027ce l F .text 0000000000000157 create_segment
00000000004019ea l F .text 000000000000050c parse_section_header
0000000000402660 l F .text 000000000000016e create_section
0000000000401ef6 l F .text 000000000000000a parse_symbol_section
000000000040252c l F .text 0000000000000134 create_symbol
00000000004032e0 g F .text 0000000000000002 __libc_csu_fini
0000000000402240 g F .text 000000000000002e emma_segment_count
00000000004022f1 g F .text 0000000000000055 emma_get_symbol
00000000004021bd g F .text 000000000000002e emma_section_count
0000000000402346 g F .text 00000000000001e6 emma_close
0000000000401f00 g F .text 000000000000002f emma_init
0000000000403270 g F .text 0000000000000065 __libc_csu_init
0000000000400c20 g F .text 0000000000000060 estr
00000000004022c3 g F .text 000000000000002e emma_symbol_count
0000000000400b10 g F .text 0000000000000000 _start
0000000000402925 g F .text 000000000000074f main
0000000000401f2f g F .text 000000000000028e emma_open

I've pruned the list a bit, it was lengthy. You can see that the 5th column (the second wide column with lots of zeros....) gives a length value for every function. main is 0x74f bytes long, emma_close is 0x1e6, parse_symbol_section is a paltry 0x0a bytes... 10 bytes! (wait... is that a stub?)

Additionally, I grep'd for just the 'F'unctions in the .text section, thus limiting the list further. The -t option to objdump shows only the symbol tables, so it omits quite a bit of other information not particularly useful towards function length gathering.

I suppose you could use it like this:

objdump -t MYPROG | grep "MYFUNCTION$" | awk '{print "0x" $(NF-1)}' | xargs -I{} -- python -c 'print {}'

An example:

00000000004019ea l F .text 000000000000050c parse_section_header

$ objdump -t emma | grep "parse_section_header$" | awk '{print "0x" $(NF-1)}' | xargs -I{} -- python -c 'print {}'
1292

Checks out, since 0x50c == 1292.

I used $(NF-1) to grab the column in awk since the second field can vary in content and spaces depending on the identifiers relevant to the symbol involved. Also, note the trailing $ in the grep, causing main to find the main function, not the entry with main.c as its name.

The xargs -I{} -- python -c 'print {}' bit is to convert the value from hex to decimal. If anyone can think of an easier way, please chime in. (You can see where awk is sneaking the 0x prefix in there).

Ah, I just remembered that I have an alias for objdump which presets the demangle option for objdump. It'll make things easier to match if you add --demangle to the objdump invocation. (I also use --wide, much easier to read, but doesn't affect this particular output).

This works on any ELF object, library, program, object file, as long as it's NOT stripped. (I tested with and without debugging symbols too)

Hope this helps.

(I looked, parse_symbol_section IS a stub.)

lornix
  • 1,946
  • 17
  • 14
  • OH! Also gives start address... which you noted as being desired. So a twofer! Well, yeah... something like that. – lornix Jul 01 '14 at 11:23
  • Apparently "objdump" is not present in Solaris 10. But Solaris 10 has "/usr/ccs/bin/elfdump", that can do this too. I am accepting the answer. Thanks!. – jcea Jul 01 '14 at 23:10
  • Odd, as `nm` and `objdump` are both in `binutils`. Perhaps a custom installation? No idea. Glad to help. Good luck. – lornix Jul 01 '14 at 23:11
  • Solaris 10 has its own "/usr/ccs/bin/nm", not the GNU nm (binutils). – jcea Jul 02 '14 at 02:06
  • So the answer would be to use binutils's objdump where available, and "/usr/ccs/bin/nm" under Solaris :-). – jcea Jul 02 '14 at 02:10
  • :) Whatever works, of course. But I looked at your parsing script (UGLY!). Use elfdump if its output is any way similar to objdump. so much easier to parse. – lornix Jul 02 '14 at 04:14
  • I was asking this question because YES, that parsing script is a really ugly hack. – jcea Jul 02 '14 at 23:22
0

Here is an all awk answer to this question to see size of all functions in certain section:

# call objdump with -t to get list of symbols
# awk filters out all the columns which are in text section
# awk sums the values in 5th column (prefixed with 0x as they are considered hex and then converted to dec with strtonum function)
objdump -t MYPROG | awk -F ' ' '($4 == ".text") {sum += strtonum("0x"$5)} END {print sum}'

And here is if you want to see only certain functions from certain section

# awk filters out all the columns which are in rom section and all function names which have anywhere in name funcname
# (we convert to lowercase the value in column 6 to avoid case sensitive regex)
# awk sums the values in 5th column (prefixed with 0x as they are considered hex and then converted to dec with strtonum function)
objdump -t MYPROG | awk -F ' ' '($4 == ".rom") && (tolower($6) ~ /_*funcname*/) {sum += strtonum("0x"$5)} END {print sum}'
Crt Mori
  • 242
  • 4
  • 11