0

I created a small unit test library in C.

Its main feature is the fact that you don't need to register your test functions, they are identified as test functions because they have a predefined prefix (test_).

For example, if you want to create a test function, you can write something like this:

int test_abc(void *t)
{
    ...
}

Yes, just like in Go.

To find the test functions, the runner:

  1. takes the name of the executable from argv[0];
  2. parses the ELF sections to find the symbol table;
  3. from the symbol table, takes all the functions named test_*;
  4. treats the addresses from the symbol table as function pointers;
  5. invoke the test functions.

For PIE binaries, there is one additional step. To find the load address for the test functions, I assume there is a common offset that applies to all functions. To figure out the offset, I subtract the address of main (runtime, function pointer) from the address of main read from the symbol table.

All the things described above are working fine: https://github.com/rodrigo-dc/testprefix

However, as far as I understood, function pointer arithmetic is not allowed by the C99 standard.

Given that I have the address from the symbol table - Is there a reliable way to get the runtime address of functions (in case of PIE binaries)?

I was hoping for some linker variable, some base address, or anything like that.

Steve Friedl
  • 3,929
  • 1
  • 23
  • 30
  • 1
    Re “function pointer arithmetic is not allowed by the C99 standard”: The C standard does not **define** function pointer arithmetic. It **allows** almost anything. It says conversions between any pointer type and an integer type are implementation-defined, in each direction (C 1999 6.3.2.3 5 and 6), so a compiler’s documentation ought to say what happens when you convert a function pointer to an integer and vice-versa. If it says it gives the address in the natural way, and vice-versa, then you can do all the arithmetic you want on the integer form. – Eric Postpischil Aug 27 '22 at 23:15
  • 1
    GCC [defines](https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Arrays-and-pointers-implementation.html#Arrays-and-pointers-implementation) pointer-to-integer conversion to give you the “unchanged” bits, if they fit. So you can easily get a function address in integer form. But then it says converting back is defined only if it references “the same object.” This may be an oversight in the documentation, since it does not adequately cover function pointers, which do not reference an object in the first place. I expect this caveat exists to support optimization using pointer provenance,… – Eric Postpischil Aug 27 '22 at 23:20
  • 1
    … and may not have any effect on function pointer conversions. That is, what you want to do might work in GCC. This sort of address calculation on functions has to work in source code such as program loaders, so it must be *de facto* supported even if the documentation is unclear or incomplete. – Eric Postpischil Aug 27 '22 at 23:21
  • @EricPostpischil Yes, it works in GCC and clang. Thanks for the link! In that page, the documentation states: > That is, one may not use integer arithmetic to avoid the undefined behavior of pointer arithmetic as proscribed in C99 and C11 6.5.6/8. That's exactly what I'm doing. Apparently the compiler can't give me any kind of guarantee in this case. Thanks! – Rodrigo Dias Corrêa Aug 29 '22 at 16:32

2 Answers2

0

Because you have an ELF executable, this probably precludes "funny" architectures (e.g. Intel 8051, PIC, etc.) that might have segmented or non-linear, non-contiguous address spaces.

So, you [probably] can use the method you've described with main to get the actual address. You just need to convert to/from either char * or uintptr_t types so you are using byte offsets/differences.


But, you can also create a unified table of pointers to the various functions using by creating descriptor structs that are placed in a special linker section of your choosing using (e.g.) __attribute__((section("mysection"))

Here is some code that shows what I mean:

#include <stdio.h>

typedef struct {
    int (*test_func)(void *);           // pointer to test function
    const char *test_name;              // name of the test
    int test_retval;                    // test return value

    // more data ...
    int test_xtra;
} testctl_t;

// define a struct instance for a given test
#define ATTACH_TEST(_func) \
    testctl_t _func##_ctl __attribute__((section("testctl"))) = { \
        .test_func = _func, \
        .test_name = #_func \
    }

// advance to next struct (must be 16 byte aligned)
#define TESTNEXT(_test) \
    (testctl_t *) (((char *) _test) + asiz)

int
test_abc(void *t)
{
    printf("test_abc: hello\n");
    return 1;
}
ATTACH_TEST(test_abc);

int
test_def(void *t)
{
    printf("test_def: hello\n");
    return 2;
}
ATTACH_TEST(test_def);

int
main(void)
{
    // these are special symbols defined by the linker for our special linker
    // section that denote the start/end of the section (similar to
    // _etext/_edata)
    extern testctl_t __start_testctl;
    extern testctl_t __stop_testctl;

    size_t rsiz = sizeof(testctl_t);
    size_t asiz;
    testctl_t *test;

    // align the size to a 16 byte boundary
    asiz = rsiz;
    asiz += 15;
    asiz /= 16;
    asiz *= 16;

    // show the struct sizes
    printf("main: sizeof(testctl_t)=%zx/%zx\n",rsiz,asiz);

    // section start and stop symbol addresses
    printf("main: start=%p stop=%p\n",&__start_testctl,&__stop_testctl);

    // cross check of expected pointer values
    printf("main: test_abc=%p test_abc_ctl=%p\n",test_abc,&test_abc_ctl);
    printf("main: test_def=%p test_def_ctl=%p\n",test_def,&test_def_ctl);

    for (test = &__start_testctl;  test < &__stop_testctl;
        test = TESTNEXT(test)) {
        printf("\n");

        // show the address of our test descriptor struct and the pointer to
        // the function
        printf("main: test=%p test_func=%p\n",test,test->test_func);

        printf("main: calling %s ...\n",test->test_name);
        test->test_retval = test->test_func(test);

        printf("main: return is %d\n",test->test_retval);
    }

    return 0;
}

Here is the program output:

main: sizeof(testctl_t)=18/20
main: start=0x404040 stop=0x404078
main: test_abc=0x401146 test_abc_ctl=0x404040
main: test_def=0x401163 test_def_ctl=0x404060

main: test=0x404040 test_func=0x401146
main: calling test_abc ...
test_abc: hello
main: return is 1

main: test=0x404060 test_func=0x401163
main: calling test_def ...
test_def: hello
main: return is 2
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • I will restrict the architecture to avoid any exotic address spaces. Thanks for the detailed example! I still have to think about it. I can see that it doesn't rely on the symbol table, which is very good (even for stripped binaries). – Rodrigo Dias Corrêa Aug 29 '22 at 18:02
0

Is there a reliable way to get the runtime address of functions (in case of PIE binaries)?

Yes: see this answer, and also the comment about using dladdr().

P.S. Note that taking address of main in C++ is not allowed.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Thank you! That's exactly what I was looking for. I executed the code from the other answer, and the value printed as `relocation` matches the address offset calculated in my current approach. – Rodrigo Dias Corrêa Aug 29 '22 at 17:48