22

Here is what I want to do:

  1. Run a program and initialize some data structures.
  2. Then compile additional code that can access/modify the existing data structures.
  3. Repeat step 2 as needed.

I want to be able to do this with both C and C++ using gcc (and eventually Java) on Unix-like systems (especially Linux and Mac OS X). The idea is to basically implement a read-eval-print loop for these languages that compiles expressions and statements as they are entered and uses them to modify existing data structures (something that is done all the time in scripting languages). I am writing this tool in python, which generates the C/C++ files, but this should not be relevant.

I have explored doing this with shared libraries but learned that modifying shared libraries does not affect programs that are already running. I have also tried using shared memory but could not find a way to load a function onto the heap. I have also considered using assembly code but have not yet attempted to do so.

I would prefer not to use any compilers other than gcc unless there is absolutely no way to do it in gcc.

If anyone has any ideas or knows how to do this, any help will be appreciated.

Matt
  • 21,026
  • 18
  • 63
  • 115

6 Answers6

15

There is one simple solution:

  1. create own library having special functions
  2. load created library
  3. execute functions from that library, pass structures as function variables

To use your structures you have to include same header files like in host application.

structs.h:

struct S {
    int a,b;
};

main.cpp:

#include <iostream>
#include <fstream>
#include <dlfcn.h>
#include <stdlib.h>

#include "structs.h"

using namespace std;

int main ( int argc, char **argv ) {

    // create own program
    ofstream f ( "tmp.cpp" );
    f << "#include<stdlib.h>\n#include \"structs.h\"\n extern \"C\" void F(S &s) { s.a += s.a; s.b *= s.b; }\n";
    f.close();

    // create library
    system ( "/usr/bin/gcc -shared tmp.cpp -o libtmp.so" );

    // load library        
    void * fLib = dlopen ( "./libtmp.so", RTLD_LAZY );
    if ( !fLib ) {
        cerr << "Cannot open library: " << dlerror() << '\n';
    }

    if ( fLib ) {
        int ( *fn ) ( S & ) = dlsym ( fLib, "F" );

        if ( fn ) {
            for(int i=0;i<11;i++) {
                S s;
                s.a = i;
                s.b = i;

                // use function
                fn(s);
                cout << s.a << " " << s.b << endl;
            }
        }
        dlclose ( fLib );
    }

    return 0;
}

output:

0 0
2 1
4 4
6 9
8 16
10 25
12 36
14 49
16 64
18 81
20 100

You can also create mutable program that will be changing itself (source code), recompiling yourself and then replace it's actual execution with execv and save resources with shared memory.

kravemir
  • 10,636
  • 17
  • 64
  • 111
  • very useful info but how would you go about including the main.cpp in the tmp.cpp? – dreamer_999 Sep 22 '14 at 00:56
  • Okey, I was going to edit the question to answer you, but there is no need :) You can't include the main.cpp in tmp. If you want to share some data, then you have to use headers (or write it directly to file) and pass structures into dynamically created function :) – kravemir Sep 24 '14 at 18:19
  • thnx! when using headers however the values of the variables shared are not the same. so i end up having to pass them to the function. i am wondering if there is some way to go around passing variables – dreamer_999 Sep 24 '14 at 20:32
  • If you include header into cpp, it's same as if you have written its content into cpp. So you end up with two instances of variables ( main.cpp and dynamic library ), but if u had the header(defining variables) included in two objects(cpp-s) in same library, then it will throw you error in compilation. You have to use "extern" keyword in header, to tell compiler, these variables are not instantiated in current object(cpp) and will be linked by linker. You can make variables 'static', they will be instantiated privately in every object, but you won't share anything anyway. – kravemir Sep 28 '14 at 11:56
14

I think you may be able to accomplish this using dynamic libraries and loading them at runtime (using dlopen and friends).

void * lib = dlopen("mynewcode.so", RTLD_LAZY);
if(lib) {
    void (*fn)(void) = dlsym(lib, "libfunc");

    if(fn) fn();
    dlclose(lib);
}

You would obviously have to be compiling the new code as you go along, but if you keep replacing mynewcode.so I think this will work for you.

Stephen Newell
  • 7,330
  • 1
  • 24
  • 28
  • 2
    Loading should be supported, I'm not sure that *un* loading is supported in all cases however. – Chris Stratton May 12 '12 at 15:13
  • @ChrisStratton: I'll confess I'm *far* from an expert on runtime loading, but the man page leads me to believe the symbols are unloaded at dlclose (specifically the `RTLD_NODELETE` flag). Take all that with a grain of salt though :). – Stephen Newell May 12 '12 at 22:41
  • @ChrisStratton I don't know of 'all' cases but in one project of mine I have never seen *`dlclose()`* not unload the symbols. Unless of course *`RTLD_NODELETE`* is passed in which case it does *not* unload them. – Pryftan Oct 27 '19 at 13:56
  • N.B. *`cosine = (double (*)(double)) dlsym(handle, "cos");` .. According to the ISO C standard, casting between function pointers and '`void *`', as done above, produces undefined results. POSIX.1-2003 and POSIX.1-2008 accepted this state of affairs and proposed the following workaround: `*(void **) (&cosine) = dlsym(handle, "cos");`* (1/2) – Pryftan Oct 27 '19 at 14:01
  • *This (clumsy) cast conforms with the ISO C standard and will avoid any compiler warnings. The 2013 Technical Corrigendum to POSIX.1-2008 (a.k.a. POSIX.1-2013) improved matters by requiring that conforming implementations support casting 'void *' to a function pointer. Nevertheless, some compilers (e.g., gcc with the '-pedantic' option) may complain about the cast used in this program.* (From *`dlopen(3)`*) Just as a note about this issue :) (2/2) – Pryftan Oct 27 '19 at 14:02
  • Oh one more thing: In the past I noticed that without matching CFLAGS in the shared libraries and the binary that loads them crashes (obviously not making it a .so but otherwise yes and making sure the binary can load them properly maybe with - and I am vague on this one - *`-rdynamic`*). I don't remember the specifics as this was decades ago but looking at it more deeply matching the flags stopped the (random) crashing. And depending on the environment you might need different PIC compiler options. – Pryftan Oct 27 '19 at 14:11
  • Make that one more thing now. As for *`RTLD_LAZY`* just to clarify what it means: for functions (not variables) it only resolves the unresolved symbols when the function is called. Depending on the use and importance of the functions this might not be desirable - as in it might be desirable to abort the program if the symbols cannot be resolved at the time. In which case you can use *`RTLD_NOW`*. You can use *`dlerror()`* to report the error of the most recent related function. You of course need the linker flag for these functions *`-ldl`*. I think that's all I have now! – Pryftan Oct 27 '19 at 14:14
  • Where is the libary that has all these functions? – Algo Sep 20 '20 at 08:14
  • @Algo - From the man page: `Link with -ldl.` If you don't have `libdl.so` on your system, check your package manager or distribution's documentation. – Stephen Newell Sep 20 '20 at 14:21
5

Even though LLVM is now used today mostly for its optimizations and backend roles in compilation, as its core it is the Low-Level Virtual Machine.

LLVM can JIT code, even though the return types may be quite opaque, so if you are ready to wrap your own code around it and don't worry too much about the casts that are going to take place, it may help you.

However C and C++ are not really friendly for this kind of thing.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
3

Yes - you can do this with Runtime Compiled C++ (or take a look at the RCC++ blog and videos), or one of its alternatives.

Doug Binks
  • 237
  • 3
  • 5
3

This can be done portably with OpenCL

OpenCL is a widely supported standard, mainly used for offloading calculations to specialized hardware, such as GPUs. However, it also works just fine on CPUs and actually performs run-time compilation of C99-like code as one of its core features (this is how the hardware portability is achieved). The newer versions (2.1+) also accept a large subset of C++14.

A basic example of such run-time compilation & execution might look something like this:

#ifdef __APPLE__
#include<OpenCL/opencl.h>
#else
#include<CL/cl.h>
#endif
#include<stdlib.h>
int main(int argc,char**argv){//assumes source code strings are in argv
    cl_int e = 0;//error status indicator
    cl_platform_id platform = 0;
    cl_device_id device = 0;
    e=clGetPlatformIDs(1,&platform,0);                                      if(e)exit(e);
    e=clGetDeviceIDs(platform,CL_DEVICE_TYPE_ALL,1,&device,0);              if(e)exit(e);
    cl_context context = clCreateContext(0,1,&device,0,0,&e);               if(e)exit(e);
    cl_command_queue queue = clCreateCommandQueue(context,device,0,&e);     if(e)exit(e);
    //the lines below could be done in a loop, assuming you release each program & kernel
    cl_program program = clCreateProgramWithSource(context,argc,(const char**)argv,0,&e);
    cl_kernel kernel = 0;                                                   if(e)exit(e);
    e=clBuildProgram(program,1,&device,0,0,0);                              if(e)exit(e);
    e=clCreateKernelsInProgram(program,1,&kernel,0);                        if(e)exit(e);
    e=clSetKernelArg(kernel,0,sizeof(int),&argc);                           if(e)exit(e);
    e=clEnqueueTask(queue,kernel,0,0,0);                                    if(e)exit(e);
    //realistically, you'd also need some buffer operations around here to do useful work
}
Community
  • 1
  • 1
Ryan Hilbert
  • 1,805
  • 1
  • 18
  • 31
2

If nothing else works - in particular, if un-loading a shared library ends up not being supported on your runtime platform, you could do it the hard way.

1) use system() or whatever to execute gcc or make or whatever to build the code

2) either link it as a flat binary or parse whatever format (elf?) the linker outputs on your platform yourself

3) get yourself some executable pages, either by mmap()'ing an executable file or do doing an anonymous mmap with the execute bit set and copying/unpacking your code there (not all platforms care about that bit, but let's assume you have one that does)

4) flush any data and instruction caches (since consistency between the two is typically not guaranteed)

5) call it via a function pointer or whatever

Of course there's another option too - depending on the level of interaction you need, you could build a separate program and either launch it and wait for the result, or fork off and launch it and talk to it by pipes or sockets. If this would meet your needs, it would be a lot less tricky.

Chris Stratton
  • 39,853
  • 6
  • 84
  • 117