0

I wanted to practice C, so I decided to write a C-interpreter in the spirit of the python interpreter. I have some C knowledge, but I've always been a learn by doing type of programmer.

What I have so far, is very simple. Just parsing the user's input, one line at a time, and distinguishing between declarations such as:

int x = 10;
char c = 'a';

where I create a struct representing the variable's type, name, and ivalue for int value and cvalue for char value. There's a lot more to go there but one step at a time.

I can also parse function calls, as such:

printf("value of x = %d\n, x);

where I extract the name of the function, and store the args in a char** args.

It sounds silly, but I would like to avoid writing a mapper for each standard c library function, in order to execute a call to something like printf or strstr or strcpy. Is there anyway to dynamically call a standard c function without this approach?

Also, suggestions on the design of this thing are very welcome.

user2357112
  • 260,549
  • 28
  • 431
  • 505
Sam Hammamy
  • 10,819
  • 10
  • 56
  • 94
  • Other than using `ctypes`? – Ignacio Vazquez-Abrams Aug 04 '13 at 22:12
  • @IgnacioVazquez-Abrams I'm writing the interpreter in C not python. – Sam Hammamy Aug 04 '13 at 22:16
  • Quibble: `int x = 10;` and `char c = 'a';` are declarations with initializations, not assignment statements. If you want to implement C, you'll need to understand its syntax. – Keith Thompson Aug 04 '13 at 22:16
  • @KeithThompson Just an error in writing. The function in my code is actually called parse_declaration – Sam Hammamy Aug 04 '13 at 22:19
  • If this program is written in C, and interpreting C, it shouldn't be tagged with Python. – user2357112 Aug 04 '13 at 22:20
  • @andrewcooke Fair point. Python tag dropped – Sam Hammamy Aug 04 '13 at 22:20
  • http://dyncall.org/ ? – andrew cooke Aug 04 '13 at 22:21
  • What are you using to parse the input? (I hope you know about things like the typedef name-identifier problem and the [lexer hack](https://en.wikipedia.org/wiki/The_lexer_hack). At least it's not something like C++ or Perl.) – user2357112 Aug 04 '13 at 22:26
  • I am using posix extended regex that I'm hoping to grow as I continue working on the program – Sam Hammamy Aug 04 '13 at 22:27
  • That's not going to cut it for general C. Interpreters are big programs, and parsing is not a trivial problem. You may want to look into [lexers](http://en.wikipedia.org/wiki/Lexical_analysis), [parsers](http://en.wikipedia.org/wiki/Parser_generator), and [context-free grammars](http://en.wikipedia.org/wiki/Context-free_grammar), and this is all before you get to any part of the interpreter involved in actually running the program. – user2357112 Aug 04 '13 at 22:35
  • This is only intended as practice for learning C, and I'm hoping to grow it as times goes on. For now, I am fine with limited features, like simple statements and function calls. – Sam Hammamy Aug 04 '13 at 22:38
  • regex? For parsing?!? Do not ever do it again!!! – SK-logic Aug 06 '13 at 09:58

2 Answers2

1

You can't (as I figure) write a C interpreter.
You'll have what I guess is a really hard time writing a C interpreter. You'll probably have to write a compiler.
Of-course you can "dissect" the language on-the-fly, parsing the code as you progress.
The real issue (as I see it), would be with handling external references.

In Python you handle external references using the import keyword.
As you know, some libs may have conflicting methods (e.g. lxml and libxml2).
This conflict is resolved by importing the correct library.
You can of-course think of some mechanism the effectively "links" or imports all the needed external references.
This will probably have certain very-specific assumptions.
In this way, when you encounter #include <stdlib.h> you actually import it.
For that matter, importing it would probably mean loading a dll of the stdlib using something like LoadLibrary() or LoadLibraryEx under windows.
After loading all the #include's you encounter, if you don't find a definition for a reference, then you'll probably traverse the local dir' for additional C files, until you encounter the sought reference, at which point I'm not sure what should be done.
That's regarding the linkage problem (which I honestly don't see how you'll overcome without proper compilation).

The other part is actually very-hard as well. You need to write a Lexer.
That's that little devil that parses-up all those lines of C code.
I assume you've fiddled with a Scheme/List interpreter writing, or perhaps even some more complex parser.
BE WARE! C is not Scheme!
It is a highly complex language to parse. It's description documents span hundreds of pages.
Writing a C lexer is not an exercise in writing interpreters.
C has some nasty context-related parsing - which basically means it's not a CFL (Context Free Language) - meaning you can't write a nice finite automata to parse it.

I'll end with an example taken from the wonderful blog of Eli Bendersky.

typedef int AA;
void foo()
{
    AA aa;       /* OK - define variable aa of type AA */
    float AA;    /* OK - define variable AA of type float */
}

This just goes to show how tricky can context-related grammar be.

Trevor
  • 1,858
  • 4
  • 21
  • 28
  • Thanks for the explanation. I started out by using [tcc](http://bellard.org/tcc/) and reading some lines of input, then compiling them, inside a main function, and calling that. However, I thought there would be a better approach. Is there a way I can compile one line a time, such that it's execution would succeed, while being able to resolve dependencies, i.e. if int x was defined three or four line prior, then the next line is int z = x + 10. – Sam Hammamy Aug 05 '13 at 00:35
  • What you're talking about, would usually mean interpreting rather than compilation (I think). It seems like going via an intermediate byte-code and trying to interpret that, is the logical thing to do. – Trevor Aug 05 '13 at 11:37
  • Wrong. It's perfectly possible to write an interpreter (although the difference between compiler and interpreter is pretty vague). Take a look at this, for example: https://code.google.com/p/c-semantics/ – SK-logic Aug 06 '13 at 09:55
  • I didn't say you _can't_ write an interpreter, all I said this is a most formidable task - while pointing a couple of the obvious difficulties. – Trevor Aug 06 '13 at 23:26
  • @Trevor, there is only one way your sentence "You can't write interpreter ... You must write a compiler" can be parsed. Please correct it. – SK-logic Aug 07 '13 at 07:59
  • True. I honestly know that every professional "decoder" for C, is a compiler - and a hand written one at that. Not even a YACC generated one. C grammer is hard! It's miles-and-miles away from a tiny "boot-leg" home-project for some wishing to experiment with interpreters or even compilers, which I assumed is the setting presumed. Sorry for not being clear enough. I still, by-the-way, think this is what the original author meant. Maybe LLVM can be considered an interpreter - I don't think that's the context of the Q. – Trevor Aug 07 '13 at 22:10
1

Aside from the hassle of parsing C's grammar, it's actually pretty well-suited to interpretation because it was designed to be processed in a single pass to account for the memory constraints of 1970s equipment.

In fact, part 6 (p. 713-787) of C: The Complete Reference, Fourth Edition by Herbert Schildt is dedicated to tying together everything the rest of the book taught by walking you through writing such an interpreter. (It's also in the third edition, which is available in The Internet Archive's lending library.)

I haven't checked whether the version in the book has any revisions to it, but his "Little C" was first presented to the world in Dr. Dobb's Journal, August, 1989.

Schildt also guides readers through writing a simple C interpreter in chapter 1 of "The Craft of C: Take-Charge Programming", which is also available via The Internet Archive's lending library.

Also, this question has a bunch of answers listing existing C intepreters.

As for dynamically calling C functions, it's easier on POSIX platforms than Windows because you may need .lib files to resolve the symbols in your .dlls, while .so files have no equivalent.

With .so files, you just dlopen the path to the library and use dlsym to retrieve a function pointer by name.

An example of doing so can be found in the Linux manpage for dlopen by running man dlopen.

ssokolow
  • 14,938
  • 7
  • 52
  • 57