6

I am implementing some limited remote debugging functionality for an application written in C running on a Linux box. The goal is to communicate with the application and lookup the value of an arbitrary variable or run an arbitrary function.

I am able to lookup symbols through dlsym() calls, but I am unable to determine if the address returned refers to a function or a variable. Is there a way to determine typing information via this symbol table?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
dykeag
  • 554
  • 3
  • 11
  • 4
    platform-dependent, but you may get away with 1. examining the address (space), or 2. by looking for some special function starting code (trampolines, etc.) –  Nov 20 '13 at 22:01
  • or 3. pull the info out of the DWARF debugging information if available (which is non-trivial) – nos Nov 20 '13 at 22:06
  • Debugging information is not available for this application; the application is so large that attempting to compile with debugging information crashes anything that tries to read it (gdb) – dykeag Nov 20 '13 at 22:10

5 Answers5

3

On on x86 platforms, you can check for the instructions used to set up the stack for a function if you can look into it's address space. It is typically:

push ebp
mov ebp, esp

I'm not positive about x64 platforms, however I think it is similar:

push rbp
mov rbp, rsp

This describes the C calling convention

Keep in mind however, compiler optimizations may optimize out these instructions. If you want this to work, you may have to add a flag to disable this optimization. I believe for GCC, -fno-omit-frame-pointer will do the trick.

chbaker0
  • 1,758
  • 2
  • 13
  • 27
  • 1
    Unless the code is compiled without optimizations on, the frame pointer is likely omitted where possible. So that would not be reliable. – Andreas Bombe Nov 20 '13 at 23:44
  • Oh, that's true. I'm sure he could disable that one optimization though. I'll edit my answer, thank you – chbaker0 Nov 21 '13 at 00:57
2

One possible solution is to extract a symbol table for the application by parsing the output of the nm utility. nm includes information on symbol type. Symbols with the T (global text) type are functions.

The trouble with this solution is that you have to ensure that your symbol table matches the target (especially if you are going to use it to extract the addresses, although using it in combination with dlsym() would be safer). The method I have used to ensure that is to make the symbol table generation part of the build process as a post-processing step.

Clifford
  • 88,407
  • 13
  • 85
  • 165
2

You can read the file /proc/self/maps and parse the first three fields of each line:

<begin-addr>-<end-addr> rwxp ...

Then you search the line that contains the address you are looking for and check the permissions:

  • r-x: it is code;
  • rw-: it is writable data;
  • r--: it is read-only data;
  • any other combination: something weird (rwxp: generated code, ...).

For example the following program:

#include <stdio.h>

void foo() {}
int x;

int main()
{
    int y;
    printf("%p\n%p\n%p\n", foo, &x, &y);
    scanf("%*s");
    return 0;
}

...in my system gives this output:

0x400570
0x6009e4
0x7fff4c9b4e2c

...and these are the relevant lines from /proc/<pid>/maps:

00400000-00401000 r-xp 00000000 00:1d 641656       /tmp/a.out
00600000-00601000 rw-p 00000000 00:1d 641656       /tmp/a.out
....
7fff4c996000-7fff4c9b7000 rw-p 00000000 00:00 0    [stack]
....

So the addresses are: code, data and data.

rodrigo
  • 94,151
  • 12
  • 143
  • 190
  • 1
    Great answer! To clarify for other readers, the first column of numbers in `/proc//maps`is an address _range_. So to determine if a symbol is an function, see if it's pointer falls within a range of addresses marked with `x`. A variable's address will be in a range not marked with `x`. – dykeag Nov 21 '13 at 16:39
  • @rodrigo can you tell me what the `%*s` does? – phyrrus9 May 17 '14 at 20:06
  • @phyrrus9: It reads a string from standard input (`%s`) but then discards it without saving it anywhere (`*`). Note that the call to `scanf()` does not have any extra parameters. I wrote that in order to stop the program until ENTER is pressed so that the file `/proc//maps` can be read. Some people prefer to use `getchar()` instead... – rodrigo May 17 '14 at 20:31
  • @rodrigo I have been just using temp vars, THANKS! – phyrrus9 May 17 '14 at 22:47
1

I guess this is not a very reliable method, but it might work:

Take the address of a well known function, such as main() and the address of a well known global variable.

Now take the address of the unknown symbol and compute the absolute value of the difference between this address and the other two. The smallest difference will indicate that the unknown address is closer to a function or to a global variable, meaning that probably it is another function or another global variable.

This method works under the asumption that the compiler/linker will pack all global variables to a specific memory block, and all functions to another memory block. Microsoft compiler, for example, put all global variables before (lower addresses in virtual memory) functions.

I'm assuming you won't be willing to check for local variables, as whose address cannot be returned by a function (once the function ends, the local variable is lost)

mcleod_ideafix
  • 11,128
  • 2
  • 24
  • 32
1

It can be done by combining dlsym() and dladdr1().

#define _GNU_SOURCE

#include <dlfcn.h>
#include <link.h>
#include <stdio.h>

int symbolType(void *sym) {
    ElfW(Sym) *pElfSym;
    Dl_info i;

    if (dladdr1(sym, &i, (void **)&pElfSym, RTLD_DL_SYMENT))
        return ELF32_ST_TYPE(pElfSym->st_info);

    return 0;
}

int main(int argc, char *argv[]) {
    for (int i=1; i < argc; ++i) {
        printf("Symbol [%s]: ", argv[i]);

        void *mySym = dlsym(RTLD_DEFAULT, argv[i]);

        // This will not work with symbols that have a 0 value, but that's not going to be very common
        if (!mySym)
            puts("not found!");
        else {
            int type = symbolType(mySym);
            switch (type) {
                case STT_FUNC: puts("Function"); break;
                case STT_OBJECT: puts("Data"); break;
                case STT_COMMON: puts("Common data"); break;
                /* get all the other types from the elf.h header file */
                default: printf("Dunno! [%d]\n", type);
            }
        }
    }

    return 0;
}
Fabio A.
  • 2,517
  • 26
  • 35