A simple classic way to organize these tasks is by setting the various parts up as subroutines passing datastructures to one another. Forgive my sloppy syntax.
First you need a definitino of a token, produced by the lexer. This is almost always a struct with an enum to indicate which token type, and a union type to carry any value the token type may have:
struct Token {
enum // distinguishes token types
{ EndOfFile, Integer, String, Float, Identifier
Semicolon, Colon, Plus, Minus, Times, Divide, LefParen, RightParen,
KeywordBegin, KeywordDeclare, KeywordEnd, ...
} tokentype
union {
long numeric_value; // holds numeric value of integer-valued token
char* string_value; // holds ptr to string body of identifiers or string literals
float float_value; // holds numeric value of floating-point valued token
} tokenvalue
}
You'll want to build an abstract syntax tree. For that you'll need a TreeNode type.
Like Tokens, these are almost always an enum to indicate which node type, and a union to hold various properties of the node type, and finally a list of pointers to children:
struct TreeNode {
enum // distiguishes tree node types
{ Goal, Function, StatementList, Statement, LeftHandSide, Expression,
Add, Subtract, Times, Divide, Identifier, FunctionCall, ...
} nodetype
children* TreeNode[4]; // a small constant here is almost always enough
union // hold values specific to node type, often includes a copy of lexer union
{ long numeric_value; // holds:
// numeric value of integer-valued token
// index of built-in function number
// actual number of children if it varies
// ...
char* string_value; // holds ptr to string body of identifiers or string literals
float float_value; // holds numeric value of floating-point valued token
} nodevalue
}
MyCompiler.c contains the following code:
int main() {
filehandle Handle = OpenSourceFile(&FileName);
ASTrootnode TreeNode = Parser(Handle);
CodeGenerator(ASTrootnode);
exit();
}
Parser.c contains the following code:
TreeNode Parser(filehandle Handle) {
<parsingmachinery>
nexttoken Token=Lexer(filehandle);
<moreparsingmachinery to build tree nodes>
return toplevel_TreeNode;
}
Lexer.c contains the following code:
Token Lexer(filehandle Handle) {
token Token;
<process characters in buffer>
if bufferp=end_of_buffer
fileread(filehandle,&bufferbody,bufferlength);
<process characters in buffer>
token.tokentype=<typeofrecognizedtoken>
token.tokenvalue.long=<valueofnumerictoken>
...
return Token;
}
You'll obviously want to put the Token and TreeNode declarations into header files that can be shared across your compiler source files.
If you build a high-performance compiler, you'll want to optimize these routines. One trivial example: the FileHandle can become a global variable, and thus need not be passed between the pieces as an explicit argument. One not so trivial example: you'll want to a high-performance lexer generator, or hand-code the lexer, to maximize its character processing rate, especially at skipping blanks and comments.
If you want to see specific details on how to construct a parser that builds ASTs, see my SO answer on building recursive descent parsers: https://stackoverflow.com/a/2336769/120163