I have a linux C program that handles request sent to a TCP socket (bound to a particular port). I want to be able to query the internal state of the C program via a request to that port, but I dont want to hard code what global variables can be queried. Thus I want the query to contain the string name of a global and the C code to look that string up in the symbol table to find its address and then send its value back over the TCP socket. Of course the symbol table must not have been stripped. So can the C program even locate its own symbol table, and is there a library interface for looking up symbols given their name? This is an ELF executable C program built with gcc.
3 Answers
This is actually fairly easy. You use dlopen
/ dlsym
to access symbols. In order for this to work, the symbols have to be present in the dynamic symbol table. There are multiple symbol tables!
#include <dlfcn.h>
#include <stdio.h>
__attribute__((visibility("default")))
const char A[] = "Value of A";
__attribute__((visibility("hidden")))
const char B[] = "Value of B";
const char C[] = "Value of C";
int main(int argc, char *argv[])
{
void *hdl;
const char *ptr;
int i;
hdl = dlopen(NULL, 0);
for (i = 1; i < argc; ++i) {
ptr = dlsym(hdl, argv[i]);
printf("%s = %s\n", argv[i], ptr);
}
return 0;
}
In order to add all symbols to the dynamic symbol table, use -Wl,--export-dynamic
. If you want to remove most symbols from the symbol table (recommended), set -fvisibility=hidden
and then explicitly add the symbols you want with __attribute__((visibility("default")))
or one of the other methods.
~ $ gcc dlopentest.c -Wall -Wextra -ldl ~ $ ./a.out A B C A = (null) B = (null) C = (null) ~ $ gcc dlopentest.c -Wall -Wextra -ldl -Wl,--export-dynamic ~ $ ./a.out A B C A = Value of A B = (null) C = Value of C ~ $ gcc dlopentest.c -Wall -Wextra -ldl -Wl,--export-dynamic -fvisibility=hidden ~ $ ./a.out A B C A = Value of A B = (null) C = (null)
Safety
Notice that there is a lot of room for bad behavior.
$ ./a.out printf printf = ▯▯▯▯ (garbage)
If you want this to be safe, you should create a whitelist of permissible symbols.

- 18,948
- 5
- 53
- 72

- 205,541
- 37
- 345
- 415
-
Printing non-string (possibly not terminated) data or functions as a string is dangerous. Otherwise I don't see any need to whitelist symbols. – R.. GitHub STOP HELPING ICE Jun 29 '12 at 01:56
-
@R..: Hm, I thought the variable was used read/write until I read the question more thoroughly. Updated. – Dietrich Epp Jun 29 '12 at 01:59
-
This is half way to what I want... the problem is that I want to be able to respond meaningfully regardless of the type of the symbol. I.e. what if A was a "long" and B a "char" and C a "char *"? I need access to the symbol type as well as its address. – JimKleck Jun 29 '12 at 18:21
-
1@JimKleck: Symbols don't have types, so that's more or less impossible. You'd either have to compile with debug information and parse it (a lot of work) or just encode the types in a separate table, and at that point you might as well encode the pointers as well so there's no point in using the ELF symbol table if you need the type. Types only really exist during compilation, the compiler deletes type information and leaves you with only raw bytes. – Dietrich Epp Aug 08 '12 at 20:04
file: reflect.c
#include <stdio.h>
#include "reflect.h"
struct sym_table_t gbl_sym_table[1] __attribute__((weak)) = {{NULL, NULL}};
void * reflect_query_symbol(const char *name)
{
struct sym_table_t *p = &gbl_sym_table[0];
for(; p->name; p++) {
if(strcmp(p->name, name) == 0) {
return p->addr;
}
}
return NULL;
}
file: reflect.h
#include <stdio.h>
struct sym_table_t {
char *name;
void *addr;
};
void * reflect_query_symbol(const char *name);
file: main.c
just #include "reflect.h" and call reflect_query_symbol
example:
#include <stdio.h>
#include "reflect.h"
void foo(void)
{
printf("bar test\n");
}
int uninited_data;
int inited_data = 3;
int main(int argc, char *argv[])
{
int i;
void *addr;
for(i=1; i<argc; i++) {
addr = reflect_query_symbol(argv[i]);
if(addr) {
printf("%s lay at: %p\n", argv[i], addr);
} else {
printf("%s NOT found\n", argv[i], addr);
}
}
return 0;
}
file:Makefile
objs = main.o reflect.o
main: $(objs)
gcc -o $@ $^
nm $@ | awk 'BEGIN{ print "#include <stdio.h>"; print "#include \"reflect.h\""; print "struct sym_table_t gbl_sym_table[]={" } { if(NF==3){print "{\"" $$3 "\", (void*)0x" $$1 "},"}} END{print "{NULL,NULL} };"}' > .reflect.real.c
gcc -c .reflect.real.c -o .reflect.real.o
gcc -o $@ $^ .reflect.real.o
nm $@ | awk 'BEGIN{ print "#include <stdio.h>"; print "#include \"reflect.h\""; print "struct sym_table_t gbl_sym_table[]={" } { if(NF==3){print "{\"" $$3 "\", (void*)0x" $$1 "},"}} END{print "{NULL,NULL} };"}' > .reflect.real.c
gcc -c .reflect.real.c -o .reflect.real.o
gcc -o $@ $^ .reflect.real.o

- 427
- 4
- 15
-
You only need to write two files like the showing "reflect.c" and "reflect.h" and modify your "Makefile", you will get a table of symbol name and corresponding address of the symbol. – yang wen Jun 29 '12 at 03:06
-
I want symbol type, or at least size, as well as address. It looks like the "-S" flag for nm will do that. Wont I need to run nm on the final executable (since I have multiple .o's) in order to get the correct size for the gbl_sym_table, then run nm again to fill it with the addresses? Then finally rebuild reflect.o and relink to get it all into the executable? – JimKleck Jun 29 '12 at 18:53
-
Actually, why bother with nm and makefile wizardry at all? The nm program must use some API to access the symbol table, I want that API to use directly in my program. – JimKleck Jun 29 '12 at 18:56
-
-
@JimKleck, yes, in the exaple, you can use the "-S" flag for nm to get the size of the symbols, and add a member to struct sym_table_t to store the size. – yang wen Jun 30 '12 at 01:36
-
The 'gcc -o $@ ...' appeared 3 times in the Makefile, we call them main.1, main.2, main.3(this is the final program). The main.1 contains all the final symbols(and the gbl_sym_table is empty), so the 'nm' and 'gcc -c .reflect.real.c -o .reflect.real.o' command generate the first .reflect.real.o, in which we get a new gbl_sym_table with correct size(but the content of gbl_sym_table may NOT correct). – yang wen Jun 30 '12 at 01:54
-
The main.2 aslo contains the same symbols as main.1(while linking, gbl_sym_table in reflect.o was replaced by gbl_sym_table in the first .reflect.real.o, as it was declared as 'weak'), so the 'nm' and 'gcc -c .reflect.real.c -o .reflect.real.o' command generate the second .reflect.real.o, in which we get a gbl_sym_table with correct size and aslo the correct content. The main.3 linked with the second .reflect.real.o has the variable gbl_sym_table with correct size and aslo the correct content of itself. – yang wen Jun 30 '12 at 01:55
-
The functions in reflect.o use gbl_sym_table give your program ability of access itself's symbols. – yang wen Jun 30 '12 at 01:56
The general term for this sort of feature is "reflection", and it is not part of C.
If this is for debugging purposes, and you want to be able to inspect the entire state of a C program remotely, examine any variable, start and stop its execution, and so on, you might consider GDB remote debugging:
GDB offers a 'remote' mode often used when debugging embedded systems. Remote operation is when GDB runs on one machine and the program being debugged runs on another. GDB can communicate to the remote 'stub' which understands GDB protocol via Serial or TCP/IP. A stub program can be created by linking to the appropriate stub files provided with GDB, which implement the target side of the communication protocol. Alternatively, gdbserver can be used to remotely debug the program without needing to change it in any way.

- 40,496
- 12
- 101
- 170
-
1This is for production, and I want to be able to browse any symbol without having to maintain a lookup table... after all, that info is already in the symbol table. – JimKleck Jun 29 '12 at 18:24