2

I'm working on a language interpreter and nearly have the first half complete. I have created my tokens and linked them together into the stream that I am after.

Specifically, a token is:

enum token_type {
    BRACKET_CURLY_LEFT, BRACKET_CURLY_RIGHT,
    BRACKET_ROUND_LEFT, BRACKET_ROUND_RIGHT,
    ARROW_LEFT, ARROW_RIGHT,

    BACKSLASH, FORWARDSLASH,
    INTEGER, DECIMAL, INVALID_NUM,
    ALLOC, CALL,
    FUNCTION, VARIABLE, WORD,
    UNDERSCORE,
    NULL_SYMB
};

class token {
public:
    token(token_type type, const char* value);
    virtual ~token();

    token_type type;
    char* value;
};

I previously did some interpreter work in Python, where I followed the lis.py and lispy.py tutorials on making a Lisp interpreter. On the lis.py page, one of the first things Norvig writes, is:

def parse(program):
    "Read a Scheme expression from a string."
    return read_from_tokens(tokenize(program))

def read_from_tokens(tokens):
    "Read an expression from a sequence of tokens."
    if len(tokens) == 0:
        raise SyntaxError('unexpected EOF while reading')
    token = tokens.pop(0)
    if '(' == token:
        L = []
        while tokens[0] != ')':
            L.append(read_from_tokens(tokens))
        tokens.pop(0) # pop off ')'
        return L
    elif ')' == token:
        raise SyntaxError('unexpected )')
    else:
        return atom(token)

def atom(token):
    "Numbers become numbers; every other token is a symbol."
    try: return int(token)
    except ValueError:
        try: return float(token)
        except ValueError:
            return Symbol(token)

If we look specifically at the read_from_tokens() function, it crawls through the list of tokens and returns both tokens and arrays, which can contain more tokens and more arrays and etc. Each list is essentially simulating the ( ... ) block.

In my C++ program, I'm trying to emulate this, by sorting my tokens into arrays which will simulate the { ... } block. Now if we look at the Python array, L = [], it can store practically any data type. A C++ std::vector cannot. However it can store (void*).

So I went off and tried creating a translation by having the functions return type as a void pointer, and then return both tokens and std::vector<token>s. However I ended up getting a bloated allocation error, probably due to the casting I do.

This is perhaps the dirtiest piece of code I will ever write...

void* group_tokens(std::vector<token> tokens) {
    token tok = tokens.at(0);
    tokens.erase(tokens.begin());
    if (tok.value == "{") {
        std::vector<token> group_arr;
        int i = 0;
        while ((tokens.at(i).value != ")")) {
            std::vector<token>* crawled_arr = (std::vector<token>*) group_tokens(tokens);
            group_arr.insert(group_arr.end(), crawled_arr->begin(), crawled_arr->end());
            i++;
        }
        tokens.erase(tokens.begin());
        return static_cast<void*>(&group_arr);
    } else {
        return static_cast<void*>(&tok);
    }
}

I skipped over translating the errors, because I write a "perfect" program that should not create any errors: { test }. This then results in allocation errors.

So to sum up, is there a way I can possibly translate these "any value-taking" Python lists into a C++ array which I can grab both std::vector<token>s from as well as tokens that works on an "executable" basis as well as a "compiling" basis albeit I am swimming in warnings?

If the question is viewed as too closed and narrow, to broaden it, the question can be viewed as, "How can I write a Python list as an array in C++ that takes vectors of objects and singular objects.

  • Disclaimer: I am not familiar with C++ container types. Python lists are not mere arrays, the are heterogeneous re-sizable lists that hold object references. In CPython they are implemented as ArrayLists, that is true, but in [this question](http://stackoverflow.com/questions/7804955/heterogeneous-containers-in-c) about heterogeneous containers in C++ the accepted answer uses C++ List, which is implemented as a doubly-linked list. This might be even better for your use case than a vector because you will be removing from the beginning (expensive for arrays) and appending to the end anyway. – juanpa.arrivillaga Sep 24 '16 at 00:36
  • Pure Python does not have arrays, just lists - which generally contain pointers to other objects. `arrays`, that is objects with a contiguous buffer of bytes, are produced with the `array` module or a third party package like `numpy`. So be careful about terminology. A python `list` is not an `array`. – hpaulj Sep 24 '16 at 00:58
  • @hpaulj Thanks for the heads up, been a while since I've Python'ed. ;) –  Sep 24 '16 at 06:29

0 Answers0