22

I'd love some references, or tips, possibly an e-book or two. I'm not looking to write a compiler, just looking for a tutorial I could follow along and modify as I go. Thank you for being understanding!

BTW: It must be C.

Any more replies would be appreciated.

tekknolagi
  • 10,663
  • 24
  • 75
  • 119
  • 2
    Diving into already existent projects can be overwhelming. Great question. – Lime Jul 31 '11 at 04:03
  • 1
    Thank you! I've tried doing that, and I find it's completely easier to write the code ground up, so you know what's what. – tekknolagi Jul 31 '11 at 04:06
  • 1
    Regarding your edit: a) Please don't put two unrelated questions into the same question, and b) use the GNU readline (or BSD editline) library. – Chris Lutz Jul 31 '11 at 04:09
  • 1
    Be sure to check the other interpreter/compiler book/tutorial questions! With such a multifaceted topic, often you need to explore the material until you discover an "angle" on it. Then you can start incorporating information from any source. – luser droog Aug 01 '11 at 19:01
  • 1
    I recommend a small BASIC interpreter in C with few lines of code, yet easy to extend: [my basic](https://github.com/paladin-t/my_basic). – pipipi Feb 19 '16 at 05:14
  • **this is off-topic now because recommendations are not accepted and it is too broad otherwise and should not be used as an example of what you can or should ask about now.** –  Jun 23 '17 at 19:11
  • @JarrodRoberson I agree -- but I think this makes for a great wiki type deal. – tekknolagi Jun 26 '17 at 17:29

3 Answers3

26

A great way to get started writing an interpreter is to write a simple machine simulator. Here's a simple language you can write an interpreter for:

The language has a stack and 6 instructions:

push <num> # push a number on to the stack

pop # pop off the first number on the stack

add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.

ifeq <address> # examine the top of the stack, if it's 0, continue, else, jump to <address> where <address> is a line number

jump <address> # jump to a line number

print # print the value at the top of the stack

dup # push a copy of what's at the top of the stack back onto the stack.

Once you've written a program that can take these instructions and execute them, you've essentially created a very simple stack based virtual machine. Since this is a very low level language, you won't need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That's too complicated for a tutorial project. Start with this, and once you've created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.

Edit:

From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.

What I would suggest is to first learn about the following topics:

  • scanf, printf, putchar, getchar - basic C IO functions
  • struct - the basic data structure in C
  • malloc - how to allocate memory, and the difference between stack memory and heap memory
  • linked lists - and how to implement a stack, then perhaps a binary tree (you'll need to understand structs and malloc first)

Then it'll be good to learn a bit more about the string.h library as well - strcmp, strdup - a couple useful string functions that will be useful.

In short, C has a much higher learning curve compared to python, just because it's a lower level language and you have to manage your own memory, so it's good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.

Charles Ma
  • 47,141
  • 22
  • 87
  • 101
  • What's `ifeq` ? I don't understand... – tekknolagi Jul 31 '11 at 07:10
  • Also are there compound statements? – tekknolagi Jul 31 '11 at 07:11
  • Lastly, I don't understand how I can interpret word instructions. In python, I can store words in an array, and it doesn't matter if they're of different lengths. However, in C, it doesn't allow you to do so. How do I interpret this in C? – tekknolagi Jul 31 '11 at 07:12
  • It's the kind of thing I'm looking for a tutorial for. – tekknolagi Jul 31 '11 at 07:12
  • ifeq stands for if equals, It basically tells you to look at the top of the stack and jump to a location if the top of the stack is not zero. It's a common assembly language symbol. This is an assembly language, and 0 usually means something is equal. The language only deals with the stack, so there are no compound statements per se, you would have to translate a "compound statement" into this language to execute it. – Charles Ma Jul 31 '11 at 07:16
  • In C, use scanf, since this language is very simple, each instruction is on it's own line with arguments separated by spaces, scanf("%s", &myBuffer); where myBuffer is a pre defined character will do e.g. char myBuffer[100]; as long as you don't exceed 99 characters in your line (which in this language you wont), you'll be fine – Charles Ma Jul 31 '11 at 07:18
  • It sounds like you learned to program in python and are looking to expand your knowledge to C, so I should also ask: are you familiar with linked lists and/or arrays in C? and the concept of a stack? Implementing an interpreter for this language presumes that you know how to write a simple stack in C. – Charles Ma Jul 31 '11 at 07:23
  • For the word processing, what I meant is that I'm not sure how to correctly parse the input. I wrote a postfix calculator, but that only used single character operators. `ifeq` just tests if the top of the stack's *value* == 0, and if not jumps to the specified location? Then does what with the address jumping? What do you mean by line number? – tekknolagi Jul 31 '11 at 07:26
  • You're exactly right about the Python, and I'm not familiar with linked lists in C, but I am familiar with arrays, and stacks. – tekknolagi Jul 31 '11 at 07:27
  • See my latest edit about learning a bit more about C first – Charles Ma Jul 31 '11 at 07:40
  • I know about scanf, printf, everything there. I'd `scanf` the input into a command buffer, but I'm not sure where I'd go from there. – tekknolagi Jul 31 '11 at 17:19
  • Also - what do you mean by `jump`? What does it do? – tekknolagi Jul 31 '11 at 18:39
  • Just implemented it in Python, now translating to C. – tekknolagi Jul 31 '11 at 20:10
  • Alright, I've written this in C. What now? – tekknolagi Aug 02 '11 at 05:36
  • This question was unbelievably helpful! github.com/tekknolagi/StackBased is my implementation! – tekknolagi Aug 15 '11 at 21:35
  • latest version: github.com/tekknolagi/gecho – tekknolagi Sep 22 '11 at 05:31
  • ... and suddenly I got a fantastic task for testing out a new programming language :) – Sebastian Graf Feb 04 '13 at 19:41
  • @tekknolagi ifeq is IF Equal to... – programmer May 29 '15 at 14:36
  • What is the address in `ifeq` and `jump` means? Is it mean the stack index? – 呂殿下-Luidenka Oct 27 '22 at 06:03
15

The only difference between an interpreter and a compiler is that instead of generating code from the AST, you execute it in a VM instead. Once you understand this, almost any compiler book, even the Red Dragon Book (first edition, not second!), is enough.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 1: What's an AST? 2: What's the definition of a VM? I have a postfix calculator that interprets the expressions. Is that a VM? 3: Thank you very much :) – tekknolagi Jul 31 '11 at 04:18
  • 1
    1. [Abstract Syntax Tree](http://en.wikipedia.org/wiki/Abstract_syntax_tree) 2. An [automaton](http://en.wikipedia.org/wiki/Automata-based_programming) that interprets code operation by operation. The postfix calculator could be seen as a simple form of a VM. – Ignacio Vazquez-Abrams Jul 31 '11 at 04:24
  • While it's the way you should be thinking if designing a real language, I think this answer may be jumping a bit too far for a beginner. It's possible to have an interpreted language (e.g. a made-up assembly language) that's completely imperative and does not need significant parsing. Actually I would consider a VM or emulator for a simple real-world cpu a great first exercise for someone interested in writing interpreters. – R.. GitHub STOP HELPING ICE Jul 31 '11 at 05:43
  • What I'm really looking for is a really simple, tiny language that I can either implement or come up with by myself. – tekknolagi Jul 31 '11 at 06:31
  • 1
    [Forth](http://en.wikipedia.org/wiki/Forth_%28programming_language%29). – Ignacio Vazquez-Abrams Jul 31 '11 at 06:32
  • 2
    Shameless plug: this is a [lexer, parser and interpreter](https://github.com/bbu/simple-interpreter) for a minimalistic imperative language, all in 1.4k LoC. It could be of good educational value. – Blagovest Buyukliev Aug 19 '15 at 09:01
  • The 'only difference' between most compilers and most interpreters is that most compilers generate machine code and most interpreters interpret intermediate code. Your answer is based on the fallacy that interpreters interpret source code. Most of them don't. – user207421 May 28 '17 at 10:59
  • @EJP: I'm curious to know how you came to that interpretation of my answer. – Ignacio Vazquez-Abrams May 28 '17 at 12:35
  • I have to ask: What's wrong with the second edition? (I genuinely don't know) – Chuck Le Butt May 13 '20 at 12:16
13

I see this is a bit of a late reply, however since this thread showed up at second place in the result list when I did a search for writing an interpreter and no one have mentioned anything very concrete I will provide the following example:

Disclaimer: This is just some simple code I wrote in a hurry in order to have a foundation for the explanation below and are therefore not perfect, but it compiles and runs, and seems to give the expected answers.

Read the following C-code from bottom to top:

#include <stdio.h>
#include <stdlib.h>

double expression(void);

double vars[26]; // variables

char get(void) { char c = getchar(); return c; } // get one byte
char peek(void) { char c = getchar(); ungetc(c, stdin); return c; } // peek at next byte
double number(void) { double d; scanf("%lf", &d); return d; } // read one double

void expect(char c) { // expect char c from stream
    char d = get();
    if (c != d) {
        fprintf(stderr, "Error: Expected %c but got %c.\n", c, d);
    }
}

double factor(void) { // read a factor
    double f;
    char c = peek();
    if (c == '(') { // an expression inside parantesis?
        expect('(');
        f = expression();
        expect(')');
    } else if (c >= 'A' && c <= 'Z') { // a variable ?
        expect(c);
        f = vars[c - 'A'];
    } else { // or, a number?
        f = number();
    }
    return f;
}

double term(void) { // read a term
    double t = factor();
    while (peek() == '*' || peek() == '/') { // * or / more factors
        char c = get();
        if (c == '*') {
            t = t * factor();
        } else {
            t = t / factor();
        }
    }
    return t;
}

double expression(void) { // read an expression
    double e = term();
    while (peek() == '+' || peek() == '-') { // + or - more terms
        char c = get();
        if (c == '+') {
            e = e + term();
        } else {
            e = e - term();
        }
    }
    return e;
}

double statement(void) { // read a statement
    double ret;
    char c = peek();
    if (c >= 'A' && c <= 'Z') { // variable ?
        expect(c);
        if (peek() == '=') { // assignment ?
            expect('=');
            double val = expression();
            vars[c - 'A'] = val;
            ret = val;
        } else {
            ungetc(c, stdin);
            ret = expression();
        }
    } else {
        ret = expression();
    }
    expect('\n');
    return ret;
}

int main(void) {
    printf("> "); fflush(stdout);

    for (;;) {
        double v = statement();
        printf(" = %lf\n> ", v); fflush(stdout);
    }
    return EXIT_SUCCESS;
}

This is an simple recursive descend parser for basic mathematical expressions supporting one letter variables. Running it and typing some statements yields the following results:

> (1+2)*3
 = 9.000000
> A=1
 = 1.000000
> B=2
 = 2.000000
> C=3
 = 3.000000
> (A+B)*C
 = 9.000000

You can alter the get(), peek() and number() to read from a file or list of code lines. Also you should make a function to read identifiers (basically words). Then you expand the statement() function to be able to alter which line it runs next in order to do branching. Last you add the branch operations you want to the statement function, like

if "condition" then 
    "statements" 
else 
    "statements" 
endif. 

while "condition" do
    "statements"
endwhile

function fac(x)
   if x = 0 then
      return 1
   else
      return x*fac(x-1) 
   endif
endfunction

Obviously you can decide the syntax to be as you like. You need to think about ways of define functions and how to handle arguments/parameter variables, local variables and global variables. If preferable arrays and data structures. References∕pointers. Input/output? In order to handle recursive function calls you probably need to use a stack.

In my opinion this would be easier to do all this with C++ and STL. Where for example one std::map could be used to hold local variables, and another map could be used for globals...

It is of course possible to write a compiler that build an abstract syntax tree out of the code. Then travels this tree in order to produce either machine code or some kind of byte code which executed on a virtual machine (like Java and .Net). This gives better performance than naively parse line by line and executing them, but in my opinion that is NOT writing an interpreter. That is writing both a compiler and its targeted virtual machine.

If someone wants to learn to write an interpreter, they should try making the most basic simple and practical working interpreter.

  • Hi @Visitor Iterator. I hope you will read me. This example is the shortest one I found, clear and working. Pls is there any online reference or book you suggest to read? – Massimo Jun 03 '20 at 15:00
  • 1
    Hi @Massimo Lucky for you I did :) Yes. I'm aware that this link https://compilers.iecc.com/crenshaw/ and the beginning of the book https://www.amazon.com/Crafting-Compiler-Charles-N-Fischer/dp/0805321667 takes a similar approach, but for writing a compiler. (I've only read the first few chapters). For an interpreter you just have to evaluate the different tokens you encounter immediately. That actually makes a lot of things easier in my opinion. (Don't need to store and traverse an abstract syntax tree and emit machine code). Hope that can be useful. – Visitor Iterator Jun 05 '20 at 08:57