5

I am trying to do something like the following

  enum types {None, Bool, Short, Char, Integer, Double, Long, Ptr};
  int main(int argc, char ** args) {
     enum types params[10] = {0};
     void* triangle = dlopen("./foo.so", RTLD_LAZY);
     void * fun = dlsym(triangle, ars[1]);

     <<pseudo code>>
  }

Where pseudo code is something like

fun = {}
for param in params:
      if param == None:
         fun += void
      if param == Bool:
          fun += Boolean
      if param == Integer:
          fun += int
      ...
 returnVal = fun.pop()
 funSignature = returnval + " " + funName + "(" + Riffle(fun, ",") + ")"
 exec funSignature

Thank you

adk
  • 4,479
  • 9
  • 36
  • 38
  • > I am trying to do something like the following So what happens when you try this? – a2800276 Aug 30 '09 at 18:32
  • @a2800276: the compiler complains about a multitude of syntax issues. The deeper problem is misunderstanding the service provided by `dlopen()` and `dlsym()`, etc. – Jonathan Leffler Aug 30 '09 at 19:01

3 Answers3

23

Actually, you can do nearly all you want. In C language (unlike C++, for example), the functions in shared objects are referenced merely by their names. So, to find--and, what is most important, to call--the proper function, you don't need its full signature. You only need its name! It's both an advantage and disadvantage --but that's the nature of a language you chose.

Let me demonstrate, how it works.

#include <dlfcn.h>

typedef void* (*arbitrary)();
// do not mix this with   typedef void* (*arbitrary)(void); !!!

int main()
{
    arbitrary my_function;
    // Introduce already loaded functions to runtime linker's space
    void* handle = dlopen(0,RTLD_NOW|RTLD_GLOBAL);
    // Load the function to our pointer, which doesn't know how many arguments there sould be
    *(void**)(&my_function) = dlsym(handle,"something");
    // Call something via my_function
    (void)  my_function("I accept a string and an integer!\n",(int)(2*2));
    return 0;
}

In fact, you can call any function that way. However, there's one drawback. You actually need to know the return type of your function in compile time. By default, if you omit void* in that typedef, int is assumed as return type--and, yes, it's a correct C code. The thing is that the compiler needs to know the size of the return type to operate the stack properly.

You can workaround it by tricks, for example, by pre-declaring several function types with different sizes of return types in advance and then selecting which one you actually are going to call. But the easier solution is to require functions in your plugin to return void* or int always; the actual result being returned via pointers given as arguments.

What you must ensure is that you always call the function with the exact number and types of arguments it's supposed to accept. Pay closer attention to difference between different integer types (your best option would be to explicitly cast arguments to them).

Several commenters reported that the code above is not guaranteed to work for variadic functions (such as printf).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
P Shved
  • 96,026
  • 17
  • 121
  • 165
  • 1
    @pavel: Could I do something like union type { int i; double d; float f; } type; (type) my_printf(...) ? – adk Aug 30 '09 at 20:26
  • @adk: I don't see anything wrong with unions. In fact, I completely forgot about them, so thanks for improving my answer! :) – P Shved Aug 30 '09 at 20:42
  • Pavel: `printf` is a bad example, because the empty parameter list declaration isn't compatible with varargs functions. (And the return type of `printf` is `int`, by the way, not `void *`). – caf Aug 31 '09 at 00:41
  • @pavel: what if I do not know the number of arguments at compile time? – adk Aug 31 '09 at 03:08
  • @adk: then you're in trouble. Actually, in this case the funciton you dlsym() should take an array, or a list, or something like that. You have some variability, but it's still C. – P Shved Aug 31 '09 at 04:05
  • @pavel: thanks. I was trying to evaluate whether this is a difficult problem. – adk Aug 31 '09 at 05:15
  • @adk: Unions are completely different types, so you'd be in trouble unless the function actually returned a union. – aib Aug 31 '09 at 12:14
  • This code is **dangerously wrong**. A variadic function (like `printf`) **must not** be called without a correct prototype. – R.. GitHub STOP HELPING ICE Aug 19 '11 at 03:31
  • And even if the function is not variadic, you can't call a function through a function pointer with a mismatching return type, or with an argument list that doesn't match the actually-required argument types. – R.. GitHub STOP HELPING ICE Aug 19 '11 at 03:32
  • @R.. well, the OP has stepped on a dangerous path of calling functions without their parameters checked. My code is no more dangerous than this very idea. But what's the problem with variadic functions? The code generated at the caller site should not depend on the prototype... – P Shved Aug 21 '11 at 14:07
  • 2
    @Pavel: Sorry, I think you need to read the language specification. "Should not depend" is not an argument that something is valid C. In C is it explicitly undefined behavior to call a variadic function without a correct prototype, and it will break on many pass-by-register architectures (I believe any but x86_64, which jumped through hoops to make it work for the sake of fools who call `printf` without `stdio.h`) or any implementation using "PASCAL" (aka "stdcall") calling convention. – R.. GitHub STOP HELPING ICE Aug 21 '11 at 14:46
  • @R.. sorry, can't read the specification and do the simulations now; I've merely edited my answer assuming you know it better than me. Is it ok now? – P Shved Aug 21 '11 at 15:09
  • I love SO because of answers like these. Would give you 100 reputation if I could. – darxsys May 11 '13 at 13:09
19

What dlsym() returns is normally a function pointer - disguised as a void *. (If you ask it for the name of a global variable, it will return you a pointer to that global variable, too.)

You then invoke that function just as you might using any other pointer to function:

int (*fun)(int, char *) = (int (*)(int, char *))dlsym(triangle, "function");

(*fun)(1, "abc");    # Old school - pre-C89 standard, but explicit
fun(1, "abc");       # New school - C89/C99 standard, but implicit

I'm old school; I prefer the explicit notation so that the reader knows that 'fun' is a pointer to a function without needing to see its declaration. With the new school notation, you have to remember to look for a variable 'fun' before trying to find a function called 'fun()'.

Note that you cannot build the function call dynamically as you are doing - or, not in general. To do that requires a lot more work. You have to know ahead of time what the function pointer expects in the way of arguments and what it returns and how to interpret it all.

Systems that manage more dynamic function calls, such as Perl, have special rules about how functions are called and arguments are passed and do not call (arguably cannot call) functions with arbitrary signatures. They can only call functions with signatures that are known about in advance. One mechanism (not used by Perl) is to push the arguments onto a stack, and then call a function that knows how to collect values off the stack. But even if that called function manipulates those values and then calls an arbitrary other function, that called function provides the correct calling sequence for the arbitrary other function.

Reflection in C is hard - very hard. It is not undoable - but it requires infrastructure to support it and discipline to use it, and it can only call functions that support the infrastructure's rules.​​​​

drewbug
  • 269
  • 4
  • 10
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • I know what the arguments ahead of time are and the return types, my question is whether I can create the casts dynamically. Thanks – adk Aug 30 '09 at 18:48
  • 1
    No; you can't create the casts dynamically. C is a compiled language; what you are seeking to do is interpret C code, which is not a trivial proposition! – Jonathan Leffler Aug 30 '09 at 18:51
  • Hi @JonathanLeffler how can I check using `dlsys` for an enum value. e.g. `dlsym(RTLD_DEFAULT, "SUNDAY")` it always gives me `NULL` Where SUNDAY is the value of enum Weekday – Inder Kumar Rathore Jan 28 '14 at 07:29
  • 1
    AFAIK, you can't get enumeration constants from `dlsym()`. They're included in debugging information, but they're not symbols in the code same way that global variables or functions are. – Jonathan Leffler Jan 28 '14 at 07:58
0

The Proper Solution

Assuming you're writing the shared libraries; the best solution I've found to this problem is strictly defining and controlling what functions are dynamically linked by:

  1. Setting all symbols hidden
    • for example clang -dynamiclib Person.c -fvisibility=hidden -o libPerson.dylib when compiling with clang
  2. Then using __attribute__((visibility("default"))) and extern "C" to selectively unhide and include functions
  3. Profit! You know what the function's signature is. You wrote it!

I found this in Apple's Dynamic Library Design Guidelines. These docs also include other solutions to the problem above was just my favorite.

The Answer to your Question

As stated in previous answers, C and C++ functions with extern "C" in their definition aren't mangled so the function's symbols simply don't include the full function signature. If you're compiling with C++ without extern "C" however functions are mangled so you could demangle them to get the full function's signature (with a tool like demangler.com or a c++ library). See here for more details on what mangling is.

Generally speaking it's best to use the first option if you're trying to import functions with dlopen.

mikeLundquist
  • 769
  • 1
  • 12
  • 26