I'm working on a language interpreter and nearly have the first half complete. I have created my tokens and linked them together into the stream that I am after.
Specifically, a token is:
enum token_type {
BRACKET_CURLY_LEFT, BRACKET_CURLY_RIGHT,
BRACKET_ROUND_LEFT, BRACKET_ROUND_RIGHT,
ARROW_LEFT, ARROW_RIGHT,
BACKSLASH, FORWARDSLASH,
INTEGER, DECIMAL, INVALID_NUM,
ALLOC, CALL,
FUNCTION, VARIABLE, WORD,
UNDERSCORE,
NULL_SYMB
};
class token {
public:
token(token_type type, const char* value);
virtual ~token();
token_type type;
char* value;
};
I previously did some interpreter work in Python, where I followed the lis.py and lispy.py tutorials on making a Lisp interpreter. On the lis.py page, one of the first things Norvig writes, is:
def parse(program):
"Read a Scheme expression from a string."
return read_from_tokens(tokenize(program))
def read_from_tokens(tokens):
"Read an expression from a sequence of tokens."
if len(tokens) == 0:
raise SyntaxError('unexpected EOF while reading')
token = tokens.pop(0)
if '(' == token:
L = []
while tokens[0] != ')':
L.append(read_from_tokens(tokens))
tokens.pop(0) # pop off ')'
return L
elif ')' == token:
raise SyntaxError('unexpected )')
else:
return atom(token)
def atom(token):
"Numbers become numbers; every other token is a symbol."
try: return int(token)
except ValueError:
try: return float(token)
except ValueError:
return Symbol(token)
If we look specifically at the read_from_tokens()
function, it crawls through the list of tokens and returns both tokens and arrays, which can contain more tokens and more arrays and etc. Each list is essentially simulating the ( ... )
block.
In my C++ program, I'm trying to emulate this, by sorting my tokens into arrays which will simulate the { ... }
block. Now if we look at the Python array, L = []
, it can store practically any data type. A C++ std::vector
cannot. However it can store (void*)
.
So I went off and tried creating a translation by having the functions return type as a void pointer, and then return both token
s and std::vector<token>
s. However I ended up getting a bloated allocation error, probably due to the casting I do.
This is perhaps the dirtiest piece of code I will ever write...
void* group_tokens(std::vector<token> tokens) {
token tok = tokens.at(0);
tokens.erase(tokens.begin());
if (tok.value == "{") {
std::vector<token> group_arr;
int i = 0;
while ((tokens.at(i).value != ")")) {
std::vector<token>* crawled_arr = (std::vector<token>*) group_tokens(tokens);
group_arr.insert(group_arr.end(), crawled_arr->begin(), crawled_arr->end());
i++;
}
tokens.erase(tokens.begin());
return static_cast<void*>(&group_arr);
} else {
return static_cast<void*>(&tok);
}
}
I skipped over translating the errors, because I write a "perfect" program that should not create any errors: { test }
. This then results in allocation errors.
So to sum up, is there a way I can possibly translate these "any value-taking" Python lists into a C++ array which I can grab both std::vector<token>
s from as well as token
s that works on an "executable" basis as well as a "compiling" basis albeit I am swimming in warnings?
If the question is viewed as too closed and narrow, to broaden it, the question can be viewed as, "How can I write a Python list as an array in C++ that takes vectors of objects and singular objects.