19

I've been trying to make an eval function in C for a while.

At the moment, my idea is to make a hash String -> function pointer with all the standard library C functions, and all the functions that I make, that way I could handle function invocations (on already defined functions).

However, defining functions with strings (i.e, calling eval("int fun(){return 1;}")) is still a problem, I don't know how I could handle this on runtime, does anyone have any idea?

Variable definitions don't seem too much of a problem, as I could just use another hash var_name -> pointer and use that pointer whenever the variable is required.

By the way, I don't care about performance, I want to get this to work.

csTroubled
  • 367
  • 1
  • 3
  • 9
  • Oh this is *way too much* for a question here. You basically are looking to write something like this: http://stackoverflow.com/q/584714/1116364 – Daniel Jour Aug 23 '16 at 03:08
  • One generally do this by either using a language suitable for embedding. Or by creating ones own language including parser and interpreter. You can't dynamically evaluate C, it's a statically compiled language. There are embeddable interpreters that accept a C-like language though, you might want to look into that? – Some programmer dude Aug 23 '16 at 03:09
  • And if you need dynamic evaluation maybe you are looking at it the wrong way? Either with the design of with the choice of language? – Some programmer dude Aug 23 '16 at 03:10
  • @DanielJour Mind elaborating on why this is too much for a question? I'll look into those interpreters, thanks. – csTroubled Aug 23 '16 at 03:15
  • @csTroubled You need parser, some way to store the results of that parser (abstract syntax tree = AST), some way to "execute" that AST or parts thereof, so some form of a "virtual machine". You need to take care of correct scoping, correct semantics with respect to sequence points, correct representation of types. That's just too much to explain/reference for one answer. I suggest this to get a grasp on what it's like to build a language interpreter: http://www.buildyourownlisp.com/ – Daniel Jour Aug 23 '16 at 03:19
  • 2
    Stackoverflow is generally for relatively small, concrete and specific problems. CA compiler error, help finding out why some piece of code crashes, etc. This is a *very* broad question that could possibly involve everything from compiler and parser theory to how interpreters and byte-code work. Please read the sections named ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask) from [the help pages](http://stackoverflow.com/help). – Some programmer dude Aug 23 '16 at 03:20
  • 3
    I'm voting to reopen this thing because I *know* the answer and it isn't "you can't". The answer does not involving writing a compiler or even parsing C nor does it involve bytecode. It's rather simple really. – Joshua Aug 23 '16 at 03:32
  • @Daniel Jour While I agree that writing a full fledged eval might be too much for a question here, I did ask a much more concrete question: How can I parse function definitions (without checking many of the "should be necessary" validations)? A working example with functions like `int f(int a){ return a+1;}` would've sufficed. – csTroubled Aug 23 '16 at 03:35
  • @Joshua Thanks for you contribution, I'm very interested. – csTroubled Aug 23 '16 at 03:36
  • 1
    @Joshua I'm looking forward to an answer then, voted to reopen, too. – Daniel Jour Aug 23 '16 at 03:38
  • @csTroubled Maybe Joshua can show an approach that's "small enough" for the site. *Parsing* such a function definition alone is a complex thing, at least when done correctly. See https://www.lysator.liu.se/c/ANSI-C-grammar-y.html for an example. – Daniel Jour Aug 23 '16 at 03:40
  • 2
    @Joshua Can you give us an overview in a comment? If it's good I'll vote to reopen too. – Some programmer dude Aug 23 '16 at 03:41
  • 1
    My plan is to invoke the compiler on the string to compile a dynamic library and load the resulting library. It's a lot easier than trying to parse C. – Joshua Aug 23 '16 at 03:42
  • 1
    @Joshua It's a *workaround* IMO, not a solution. It also have some problems that are hard to overcome to make it generic (like what header files needs to be included in the generated source file). It's also quite complex for beginners and definitely non-portable. But it might be enough for the needs of the OP so I'll vote to reopen. It *is* better than creating an executable and then use `system` to get the result, which is a really bad workaround. – Some programmer dude Aug 23 '16 at 03:45
  • 2
    @joshua: ok, voted to reopen. I'm curious to see your solution to invoking a compiled function without parsing its prototype. – rici Aug 23 '16 at 03:56
  • 3
    XY problem? Why do you need this? Dynamic languages embeddable in C are a dime a dozen, why do you want to grow your own and make it resemble C? – n. m. could be an AI Aug 23 '16 at 05:40
  • @Joshua Also how you handle cases like `eval("void set_x(){x=1;}")` which accesses a global variable in the main program, or `eval("void callme(int x){func(x);}")` which calls a function in the main program. Or how you handle two `eval` calls where the second one modifies state established by the first call, such as through a shared variable. (Clearly that is in scope, seeing as OP has a hash of variable names.) – Raymond Chen Aug 23 '16 at 13:34
  • @RaymondChen: Good question. I had considered such things in preparing my answer last night but I don't like multi-paragraph comments. – Joshua Aug 23 '16 at 15:19

4 Answers4

10

A couple of weeks back, I wanted to do something similar and this is the first question that I stumbled upon, hence answering here, now that I have some hold on this :) I am surprised nobody mentioned tcc (specifically libtcc) which lets you compile code from a string and invoke the function thus defined. For e.g.:

int (*sqr)(int) = NULL;
TCCState *S = tcc_new();

tcc_set_output_type(S, TCC_OUTPUT_MEMORY);
tcc_compile_string(S, "int squarer(int x) { return x*x; }");
tcc_relocate(S, TCC_RELOCATE_AUTO);
sqr = tcc_get_symbol(S, "func");

printf("%d", sqr(2));
tcc_delete(S);

(Error handling omitted for brevity). Beyond this basic example, if one wants to use the variables of the host program within the dynamic function, a little more work is needed. If I had a variable int N; and I wanted to use it, I would need 2 things: In the code string:

 ... "extern int N;"

Tell tcc:

tcc_add_symbol(S, "N", &N);

Similarly, there are APIs to inject Macros, open entire libraries etc. HTH.

Ani
  • 1,448
  • 1
  • 16
  • 38
9

Trying to parse C is a real pain in the behind; but we already know how to parse C; invoke the C compiler! Here we compile the eval code into a dynamic library and load it.

You may run into problems where your dynamic code can't find other functions or variables in your own code; the easy solution for that is to compile your whole program except for main() as a library and link the dynamic code library against it. You can avoid the -fpic penalty by setting your library load address only a few K above your main load address. On Linux, unresolved symbols in a library can be resolved by the executable if not stripped, and glibc depends on this functionality; however, compiler optimizations get in the way sometimes so the total library method may be necessary.

Sample code below is for Linux. This can be adopted to other Unix including Mac OSX with minor work. Attempting on Windows is possible, but harder as you have no guarantee of a C compiler unless you're willing to ship one; and on Windows there's the obnoxious rule about multiple C runtimes so you must build with the same one you ship, and therefore must also build with the same compiler you ship. Also, you must use the total library technique here or symbols in your main program just won't resolve in the library (PE file format can't express the necessary).

This sample code provides no way for the eval() code to save state; if you need this you should do so either by variables in the main program or (preferred) passing in a state structure by address.

If you are trying to do this in an embedded environment, don't. This is a bad idea in the embedded world.

In answer to rici's comment; I have never seen a case where the argument types and return type of an eval() block were not statically determined from the surrounding code; besides else how would you be able to call it? Example code below could be cut up extracting the shared part so the per-type part is only a couple of lines; exercise is left for the reader.

If you don't have a specific reason to want dynamic C; try embedded LUA instead with a well-defined interface.

/* gcc -o dload dload.c -ldl */

#include <dlfcn.h>
#include <stdio.h>

typedef void (*fevalvd)(int arg);

/* We need one of these per function signature */
/* Disclaimer: does not support currying; attempting to return functions -> undefined behavior */
/* The function to be called must be named fctn or this does not work. */
void evalvd(const char *function, int arg)
{
        char buf1[50];
        char buf2[50];
        char buf3[100];
        void *ctr;
        fevalvd fc;
        snprintf(buf1, 50, "/tmp/dl%d.c", getpid());
        snprintf(buf2, 50, "/tmp/libdl%d.so", getpid());
        FILE *f = fopen(buf1, "w");
        if (!f) { fprintf (stderr, "can't open temp file\n"); }
        fprintf(f, "%s", function);
        fclose(f);
        snprintf(buf3, 100, "gcc -shared -fpic -o %s %s", buf2, buf1);
        if (system(buf3)) { unlink(buf1); return ; /* oops */ }

        ctr = dlopen(buf2, RTLD_NOW | RTLD_LOCAL);
        if (!ctr) { fprintf(stderr, "can't open\n"); unlink(buf1); unlink(buf2); return ; }
        fc = (fevalvd)dlsym(ctr, "fctn");
        if (fc) {
                fc(arg);
        } else {
                fprintf(stderr, "Can't find fctn in dynamic code\n");
        }
        dlclose(ctr);
        unlink(buf2);
        unlink(buf1);
}

int main(int argc, char **argv)
{
        evalvd("#include <stdio.h>\nvoid fctn(int a) { printf(\"%d\\n\", a); }\n", 10);
}
Joshua
  • 40,822
  • 8
  • 72
  • 132
  • As noted in the comments, this does seem like a roundabout way, and requires many more permissions (creating files, running gcc, etc) on the system than a *real* `eval` (dynamically parsing the code, only requiring (potentially a lot of) memory). – YoTengoUnLCD Aug 23 '16 at 23:48
  • I just tested your example, your eval example calls the function `fctn` with the value `10`, but nowhere in the string there's an invocation, this is surely not the intended behaviour. – YoTengoUnLCD Aug 24 '16 at 00:19
  • The 10 is the second argument to the evalvd function; it is indeed the intended behavior. – Joshua Aug 24 '16 at 01:35
  • 1
    Let me rephrase myself, it is the indented behavior by you, not in any regular eval function, why are you invoking a function that's just being defined in that code? – YoTengoUnLCD Aug 24 '16 at 01:40
  • @YoTengoUnLCD: Because in C you cannot have an expression in top-level code; therefore the top-level code must be wrapped in a function by the code-builder implied to exist by the question; and it takes arguments because closures are probably not possible without compiling the surrounding code with a custom compiler. Also note that OP knows this and even his example wraps it in a function. – Joshua Aug 24 '16 at 15:09
  • In my case (macOS), `#include ` and `#include ` are required to compile. It works perfect for me, ty – user9869932 May 22 '21 at 01:11
  • [Meaning of library dl in gcc](https://stackoverflow.com/questions/19146525/meaning-of-library-dl-in-gcc) – Rick Apr 15 '22 at 14:38
5

It's possible, but a pain to do. You need to write parser that takes the text as input and generates a syntax tree; then you need to simplify constructs (eg. converting loops into goto statements and simplify expressions into single-static assignments that have only 1 operation). Then you need to match all of the patterns in your syntax tree with sequences of instructions on the target machine that perform the same tasks. Finally, you need to select the registers to use for each of those instructions, spilling them onto the stack if necessary.

In short, writing an implementation for eval in C is possible, but a huge amount of work that requires a lot of expertise and knowledge in several fields of computer science. The complexity of writing a compiler is the precise reason why most programming languages are either interpreted or use a virtual machine with a custom bytecode. Tools like clang and llvm make this a lot easier, but those are written in C++, not C.

DeftlyHacked
  • 405
  • 2
  • 9
  • There's still the issue of calling the function. You might need to look at the Foreign Function Interface library ([`libffi`](https://sourceware.org/libffi/) or on Github [`libffi`](https://github.com/libffi/libffi)). Or there might be another way to do that work. – Jonathan Leffler Aug 23 '16 at 06:47
  • @JonathanLeffler I considered mentioning libffi, but I figured he could probably read between the lines that I was saying that what he's trying to do is just completely impractical. He would be far better off using Lua or Python like everyone else. – DeftlyHacked Aug 23 '16 at 08:15
0

Bearing in mind some restrictions, using OpenCL can be a possible way to implement eval in C/C++. Once your OpenCL implementation provides ability to compile kernels and execute them, not matter where on CPU or GPU (or some other 'accelerator' device), that means you can generate kernel code strings at your C/C++ application runtime, compile them and enqueue to execute. Also OpenCL APIs provide an ability to lookup for kernel compilation, linking and execution errors. So, please take a look onto OpenCL.

Mykyta Kozlov
  • 413
  • 3
  • 14