62

C++ is a static, compiled language, templates are resolved during compile time and so on...

But is it possible to create a function during runtime, that is not described in the source code and has not been converted to machine language during compilation, so that a user can throw at it data that has not been anticipated in the source?

I am aware this cannot happen in a straightforward way, but surely it must be possible, there are plenty of programing languages that are not compiled and create that sort of stuff dynamically that are implemented in either C or C++.

Maybe if factories for all primitive types are created, along with suitable data structures to organize them into more complex objects such as user types and functions, this is achievable?

Any info on the subject as well as pointers to online materials are welcome. Thanks!

EDIT: I am aware it is possible, it is more like I am interested in implementation details :)

dtech
  • 47,916
  • 17
  • 112
  • 190
  • 2
    Can you give an example as to what you would expect? – Luchian Grigore Jun 13 '12 at 13:39
  • Compiers are often written in C++. Much of .NET is written in C++. The answer is yes. – John Dibling Jun 13 '12 at 13:41
  • writing an interpreter is actually rather simple... – Daren Thomas Jun 13 '12 at 13:45
  • 1
    @DarenThomas, but it gets tricky when dealing C++. The parser is not trivial. – riwalk Jun 13 '12 at 13:49
  • 2
    @LuchianGrigore - the idea is not to directly parse code but of a visual data structure and function editor which can test stuff (performance is not crucial) and later the whole program structure can be serializes to C++ code (every component "knows" how), which can then be compiled conventionally. I have a vision of a new way of programming that is less about typing and more about being visual and conceptually expressing, but I need to have some runtime for it to run on top of before being saved to C++ source and compiled. It doesn't need to compile to code directly, just run. – dtech Jun 13 '12 at 14:24
  • 2
    Modern operating systems don't normally allow you to allocate memory and then mark it executable. While it certainly **is** possible (malware does this when it can), I'd use a scripting engine instead. – richard.albury Jun 13 '12 at 14:51
  • 1
    See also: [c++ - How to generate and run native code dynamically? - Stack Overflow](https://stackoverflow.com/questions/4911993/how-to-generate-and-run-native-code-dynamically#comments-4912662) – user202729 Jan 11 '20 at 04:19
  • @JohnDibling I thought they'd be writing .NET and C# in older versions of C#. Nothing that would prevent it? It's not like it matters if you generate machine code in C# or C++, no matter what language it is for. – KulaGGin Aug 02 '23 at 09:47
  • @richard.albury _"Modern operating systems don't normally allow you to allocate memory and then mark it executable"_. Modern Windows allows it(and allowed back in 2012): `std::vector Bytes(0x1000); DWORD OldProtect; VirtualProtect(&Bytes.front(), Bytes.size(), PAGE_EXECUTE_READWRITE, &OldProtect)`. That's it, now your bytes are executable. – KulaGGin Aug 02 '23 at 09:52
  • @dtech Can I have a clarification please? Basically, in principle, you are asking if in C++ is possible to do (more or less) the same thing is done by the **Function()** constructor in Javascript? (For example **var functionName = [new] Function (arg0, arg1, ..., argN, functionBody);** and run it as: **functionName(arg0, arg1, ..., argN)** ). Am I right? – willy wonka Aug 02 '23 at 13:14

14 Answers14

53

Yes, of course, without any tools mentioned in the other answers, but simply using the C++ compiler.

just follow these steps from within your C++ program (on linux, but must be similar on other OS)

  1. write a C++ program into a file (e.g. in /tmp/prog.cc), using an ofstream
  2. compile the program via system("c++ /tmp/prog.cc -o /tmp/prog.so -shared -fPIC");
  3. load the program dynamically, e.g. using dlopen()
Walter
  • 44,150
  • 20
  • 113
  • 196
43

You can also just give the bytecode directly to a function and just pass it casted as the function type as demonstrated below.

e.g.

byte[3] func = { 0x90, 0x0f, 0x1 }
*reinterpret_cast<void**>(&func)()
Jay
  • 3,276
  • 1
  • 28
  • 38
  • 24
    What a hack! How do you know the byte codes of any functions? Does this really work? – Walter Jun 13 '12 at 18:04
  • 8
    This doesn't work at all. The first problem is precedence: `()` binds tighter than `*`, so this parses as `* ( reinterpret_cast(&func)() )`. This fails because you cannot call a `void **` (it's not a function type). Fixing the precedence doesn't help: If you dereference first, it'll try to call a `void *`, which is not a function type either. If you manage to correctly cast `&func` (the address of an array, presumably: `byte[3] func` is actually a syntax error) to the address of a function pointer (`void (**)()`) and dereference it, it will crash because ... – melpomene Sep 08 '18 at 14:16
  • 3
    ... it's now interpreting the contents of the array (`0x90, 0x0f, 0x1`) as a function pointer, which is nonsense. If you want to get it to compile, you need something like `unsigned char func[] = { 0x90, 0x0f, 0x1 }; reinterpret_cast(func)();` (none of this pointer-to-pointer stuff), which will probably still crash at runtime, but at least it's now asking the CPU to execute code from a byte array. – melpomene Sep 08 '18 at 14:20
  • 9
    (It will probably crash because even if you have the right bytecode for your processor architecture, the stack and data segment are probably not marked as executable on any modern OS.) – melpomene Sep 08 '18 at 14:23
  • @melpomene, your splitting hairs here... byte must be defined and specifying the size of the array there is actually optional. Finally as arrays are passed by reference, actually a pointer to the first member, technically you can use a single indirection however my code was showing how you could take the pointer to the code, obviously if you not using the additional indirection you wouldn't need it. Finally if the memory is not marked executable then you would need to do so or write the value to memory which was already marked as such and execute from there. Thank you for the comments though! – Jay Sep 08 '18 at 14:36
  • 2
    @Jay I'm not splitting hairs. The declaration syntax (the least important part) is `byte func[3]`, not `byte[3] func`. If you write `&func`, you get a pointer to the whole array. This is still a single level of indirection. If you try to dereference it twice (in your code: once by `*`, once by `()` (the function call operator)), you end up treating the 3 bytes as a memory address. This cannot work, even if you overlook all of the precedence and type errors. – melpomene Sep 08 '18 at 14:47
  • Not in gcc and not using c++17 or greater... https://stackoverflow.com/questions/46150738/how-to-use-new-stdbyte-type-in-places-where-old-style-unsigned-char-is-needed, visual studio was able to run it fine. – Jay Sep 08 '18 at 18:02
  • @melpomene Keep in mind if the stack was already executable and the memory was user input the points you listed are quite moot. – Jay Jan 12 '21 at 12:02
22

Yes, JIT compilers do it all the time. They allocate a piece of memory that has been given special execution rights by the OS, then fill it with code and cast the pointer to a function pointer and execute it. Pretty simple.

EDIT: Here's an example on how to do it in Linux: http://burnttoys.blogspot.de/2011/04/how-to-allocate-executable-memory-on.html

Milan
  • 3,342
  • 3
  • 31
  • 40
15

Below an example for C++ runtime compilation based on the method mentioned before (write code to output file, compile via system(), load via dlopen() and dlsym()). See also the example in a related question. The difference here is that it dynamically compiles a class rather than a function. This is achieved by adding a C-style maker() function to the code to be compiled dynamically. References:

The example only works under Linux (Windows has LoadLibrary and GetProcAddress functions instead), and requires the identical compiler to be available on the target machine.

baseclass.h

#ifndef BASECLASS_H
#define BASECLASS_H
class A
{
protected:
    double m_input;     // or use a pointer to a larger input object
public:
    virtual double f(double x) const = 0;
    void init(double input) { m_input=input; }
    virtual ~A() {};
};
#endif /* BASECLASS_H */

main.cpp

#include "baseclass.h"
#include <cstdlib>      // EXIT_FAILURE, etc
#include <string>
#include <iostream>
#include <fstream>
#include <dlfcn.h>      // dynamic library loading, dlopen() etc
#include <memory>       // std::shared_ptr

// compile code, instantiate class and return pointer to base class
// https://www.linuxjournal.com/article/3687
// http://www.tldp.org/HOWTO/C++-dlopen/thesolution.html
// https://stackoverflow.com/questions/11016078/
// https://stackoverflow.com/questions/10564670/
std::shared_ptr<A> compile(const std::string& code)
{
    // temporary cpp/library output files
    std::string outpath="/tmp";
    std::string headerfile="baseclass.h";
    std::string cppfile=outpath+"/runtimecode.cpp";
    std::string libfile=outpath+"/runtimecode.so";
    std::string logfile=outpath+"/runtimecode.log";
    std::ofstream out(cppfile.c_str(), std::ofstream::out);

    // copy required header file to outpath
    std::string cp_cmd="cp " + headerfile + " " + outpath;
    system(cp_cmd.c_str());

    // add necessary header to the code
    std::string newcode =   "#include \"" + headerfile + "\"\n\n"
                            + code + "\n\n"
                            "extern \"C\" {\n"
                            "A* maker()\n"
                            "{\n"
                            "    return (A*) new B(); \n"
                            "}\n"
                            "} // extern C\n";

    // output code to file
    if(out.bad()) {
        std::cout << "cannot open " << cppfile << std::endl;
        exit(EXIT_FAILURE);
    }
    out << newcode;
    out.flush();
    out.close();

    // compile the code
    std::string cmd = "g++ -Wall -Wextra " + cppfile + " -o " + libfile
                      + " -O2 -shared -fPIC &> " + logfile;
    int ret = system(cmd.c_str());
    if(WEXITSTATUS(ret) != EXIT_SUCCESS) {
        std::cout << "compilation failed, see " << logfile << std::endl;
        exit(EXIT_FAILURE);
    }

    // load dynamic library
    void* dynlib = dlopen (libfile.c_str(), RTLD_LAZY);
    if(!dynlib) {
        std::cerr << "error loading library:\n" << dlerror() << std::endl;
        exit(EXIT_FAILURE);
    }

    // loading symbol from library and assign to pointer
    // (to be cast to function pointer later)
    void* create = dlsym(dynlib, "maker");
    const char* dlsym_error=dlerror();
    if(dlsym_error != NULL)  {
        std::cerr << "error loading symbol:\n" << dlsym_error << std::endl;
        exit(EXIT_FAILURE);
    }

    // execute "create" function
    // (casting to function pointer first)
    // https://stackoverflow.com/questions/8245880/
    A* a = reinterpret_cast<A*(*)()> (create)();

    // cannot close dynamic lib here, because all functions of the class
    // object will still refer to the library code
    // dlclose(dynlib);

    return std::shared_ptr<A>(a);
}


int main(int argc, char** argv)
{
    double input=2.0;
    double x=5.1;
    // code to be compiled at run-time
    // class needs to be called B and derived from A
    std::string code =  "class B : public A {\n"
                        "    double f(double x) const \n"
                        "    {\n"
                        "        return m_input*x;\n"
                        "    }\n"
                        "};";

    std::cout << "compiling.." << std::endl;
    std::shared_ptr<A> a = compile(code);
    a->init(input);
    std::cout << "f(" << x << ") = " << a->f(x) << std::endl;

    return EXIT_SUCCESS;
}

output

$ g++ -Wall -std=c++11 -O2 -c main.cpp -o main.o   # c++11 required for std::shared_ptr
$ g++ -ldl main.o -o main
$ ./main
compiling..
f(5.1) = 10.2
user1059432
  • 321
  • 2
  • 5
  • Why create a string `newcode` when it could be written directly into the stream without creating temporary objects? – Jimmy R.T. Dec 22 '20 at 17:14
13

Have a look at libtcc; it is simple, fast, reliable and suits your need. I use it whenever I need to compile C functions "on the fly".

In the archive, you will find the file examples/libtcc_test.c, which can give you a good head start. This little tutorial might also help you: http://blog.mister-muffin.de/2011/10/22/discovering-tcc/

#include <stdlib.h>
#include <stdio.h>
#include "libtcc.h"

int add(int a, int b) { return a + b; }

char my_program[] =
"int fib(int n) {\n"
"    if (n <= 2) return 1;\n"
"    else return fib(n-1) + fib(n-2);\n"
"}\n"
"int foobar(int n) {\n"
"    printf(\"fib(%d) = %d\\n\", n, fib(n));\n"
"    printf(\"add(%d, %d) = %d\\n\", n, 2 * n, add(n, 2 * n));\n"
"    return 1337;\n"
"}\n";

int main(int argc, char **argv)
{
    TCCState *s;
    int (*foobar_func)(int);
    void *mem;

    s = tcc_new();
    tcc_set_output_type(s, TCC_OUTPUT_MEMORY);
    tcc_compile_string(s, my_program);
    tcc_add_symbol(s, "add", add);

    mem = malloc(tcc_relocate(s, NULL));
    tcc_relocate(s, mem);

    foobar_func = tcc_get_symbol(s, "foobar");

    tcc_delete(s);

    printf("foobar returned: %d\n", foobar_func(32));

    free(mem);
    return 0;
}

Ask questions in the comments if you meet any problems using the library!

Mathieu Rodic
  • 6,637
  • 2
  • 43
  • 49
8

In addition to simply using an embedded scripting language (Lua is great for embedding) or writing your own compiler for C++ to use at runtime, if you really want to use C++ you can just use an existing compiler.

For example Clang is a C++ compiler built as libraries that could be easily embedded in another program. It was designed to be used from programs like IDEs that need to analyze and manipulate C++ source in various ways, but using the LLVM compiler infrasructure as a backend it also has the ability to generate code at runtime and hand you a function pointer that you can call to run the generated code.

bames53
  • 86,085
  • 15
  • 179
  • 244
4

Essentially you will need to write a C++ compiler within your program (not a trivial task), and do the same thing JIT compilers do to run the code. You were actually 90% of the way there with this paragraph:

I am aware this cannot happen in a straightforward way, but surely it must be possible, there are plenty of programing languages that are not compiled and create that sort of stuff dynamically that are implemented in either C or C++.

Exactly--those programs carry the interpreter with them. You run a python program by saying python MyProgram.py--python is the compiled C code that has the ability to interpret and run your program on the fly. You would need do something along those lines, but by using a C++ compiler.

If you need dynamic functions that badly, use a different language :)

riwalk
  • 14,033
  • 6
  • 51
  • 68
4

A typical approach for this is to combine a C++ (or whatever it's written on) project with scripting language.
Lua is one of the top favorites, since it's well documented, small, and has bindings for a lot of languages.

But if you are not looking into that direction, perhaps you could think of making a use of dynamic libraries?

Andrejs Cainikovs
  • 27,428
  • 2
  • 75
  • 95
1

Yes - you can write a compiler for C++, in C++, with some extra features - write your own functions, compile and run automatically (or not)...

Luchian Grigore
  • 253,575
  • 64
  • 457
  • 625
  • I am not looking forward into compiling the objects, created dynamically at runtime to machine code, just execute them, albeit not with top performance and efficiency. – dtech Jun 13 '12 at 13:47
1

Have a look into ExpressionTrees in .NET - I think this is basically what you want to achieve. Create a tree of subexpressions and then evaluate them. In an object-oriented fashion, each node in the might know how to evaluate itself, by recursion into its subnodes. Your visual language would then create this tree and you can write a simple interpreter to execute it.

Also, check out Ptolemy II, as an example in Java on how such a visual programming language can be written.

Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
1

You could take a look at Runtime Compiled C++ (or see RCC++ blog and videos), or perhaps try one of its alternatives.

Doug Binks
  • 237
  • 3
  • 5
1

Expanding on Jay's answer using opcodes, the below works on Linux.

  1. Learn opcodes from your compiler:
    • write own myfunc.cpp, e.g.
      double f(double x) { return x*x; }
      
    • compile with
      $ g++ -O2 -c myfunc.cpp
      
    • disassemble function f
      $ gdb -batch -ex "file ./myfunc.o" -ex "set disassembly-flavor intel" -ex "disassemble/rs f"
      Dump of assembler code for function _Z1fd:
         0x0000000000000000 <+0>:     f2 0f 59 c0     mulsd  xmm0,xmm0
         0x0000000000000004 <+4>:     c3      ret    
      End of assembler dump.
      
      This means the function x*x in assembly is mulsd xmm0,xmm0, ret and in machine code f2 0f 59 c0 c3.
  2. Write your own function in machine code:
    • opcode.cpp
      #include <cstdlib>          // EXIT_FAILURE etc
      #include <cstdio>           // printf(), fopen() etc
      #include <cstring>          // memcpy()
      #include <sys/mman.h>       // mmap()
      
      // allocate memory and fill it with machine code instructions
      // returns pointer to memory location and length in bytes
      void* gencode(size_t& length)
      {
          // machine code
          unsigned char opcode[] = {
              0xf2, 0x0f, 0x59, 0xc0,         // mulsd  xmm0,xmm0
              0xc3                            // ret
          };
          // allocate memory which allows code execution
          // https://en.wikipedia.org/wiki/NX_bit
          void* buf = mmap(NULL,sizeof(opcode),PROT_READ|PROT_WRITE|PROT_EXEC,
                           MAP_PRIVATE|MAP_ANON,-1,0);
          // copy machine code to executable memory location
          memcpy(buf, opcode, sizeof(opcode));
          // return: pointer to memory location with executable code
          length = sizeof(opcode);
          return buf;
      }
      
      // print the disassemby of buf
      void print_asm(const void* buf, size_t length)
      {
          FILE* fp = fopen("/tmp/opcode.bin", "w");
          if(fp!=NULL) {
              fwrite(buf, length, 1, fp);
              fclose(fp);
          }
          system("objdump -D -M intel -b binary -mi386 /tmp/opcode.bin");
      }
      
      int main(int, char**)
      {
          // generate machine code and point myfunc() to it
          size_t length;
          void* code=gencode(length);
          double (*myfunc)(double);   // function pointer
          myfunc = reinterpret_cast<double(*)(double)>(code);
      
          double x=1.5;
          printf("f(%f)=%f\n", x,myfunc(x));
          print_asm(code,length);     // for debugging
          return EXIT_SUCCESS;
      }
      
      
    • compile and run
      $ g++ -O2 opcode.cpp -o opcode
      $ ./opcode
      f(1.500000)=2.250000
      
      /tmp/opcode.bin:     file format binary
      
      
      Disassembly of section .data:
      
      00000000 <.data>:
         0:   f2 0f 59 c0             mulsd  xmm0,xmm0
         4:   c3                      ret    
      
0

The simplest solution available, if you're not looking for performance is to embed a scripting language interpreter, e.g. for Lua or Python.

Vlad
  • 18,195
  • 4
  • 41
  • 71
  • I am not looking forward to embedding a third party interpreted language but more like create those facilities on my own according to my own needs. – dtech Jun 13 '12 at 13:43
  • -1. I don't think this answers the question. No where did he ask "What languages support this?" He asked, "Can I do it in C++?" – riwalk Jun 13 '12 at 13:46
  • Dear Vlad @Vlad, do you know of any open source project that embeds python, I wish to seem this in action, thanks! – Taozi Aug 01 '15 at 03:03
  • See [AppsWithPythonScripting](https://wiki.python.org/moin/AppsWithPythonScripting) and [Embedding Python in C/C++](http://www.codeproject.com/Articles/11805/Embedding-Python-in-C-C-Part-I). – Vlad Aug 01 '15 at 09:05
0

It worked for me like this. You have to use the -fpermissive flag. I am using CodeBlocks 17.12.

#include <cstddef>

using namespace std;
int main()
{
    char func[] = {'\x90', '\x0f', '\x1'};
    void (*func2)() = reinterpret_cast<void*>(&func);
    func2();
    return 0;
}
Anyone
  • 51
  • 1
  • 1
  • 7