I asked Mr. Sweeney via email and received this answer:
The C++ code is using template classes I wrote that implement parser combinators. The idea is to start with some basic parsers, such as literals where PLit<'A'> parses a literal character 'A', PAny<> parses any character but fails if at the end of fail, PEof fails unless we're not at the end of the file, etc. And then we combine them into arbitrary trees using combinators like PAnd which parses a then b, and succeeds only if both succeed -- else it fails with the parse point unmoved. And if it succeeds, the result is a struct containing two fields, one for the result of a, and one for the result of b. And so on.
The implementation is messy in C++ for a number of reasons, including that templates don't support arbitrary variadic parameters, and without first-class lambdas we can't easily process the results inline with the parser.
Here are some snippets of the template code, from which you can probably figure out the details of the framework.
// Parses a literal item.
UBOOL LiteralEvaluate (UClass* C,class FParseInBase& In,class FParseOutBase& Out)
{
FParseInMark M(In);
NNode* e = In.GetNextSource();
if (e && e->IsA(C))
{
Out.Callback(e);
return 1;
}
M.Restore(In);
return 0;
}
// Optional item of the specified type.
template <class U> struct Optional
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
U::Evaluate(In,Out);
return 1;
}
};
// Ignore items by absorbing them; retains boolean logic but not callback.
template <class T> struct Ignore
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
return T::Evaluate(In,GIgnore);
}
};
// Zero or more items of the specified type.
template <class T> struct ZeroOrMore
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
while (T::Evaluate(In,Out));
return 1;
}
};
// One or more items of the specified type.
template <class T> struct OneOrMore
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
for( INT i=0; T::Evaluate(In,Out); i++ );
return i>0;
}
};
// Always fails.
struct RFalse
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
return 0;
}
};
// Always succeeds.
struct RTrue
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
return 1;
}
};
// Parses the first matching items of the specified subtypes of T.
template <class A,class B=RFalse,class C=RFalse,class D=RFalse > struct Or
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
return A::Evaluate(In,Out) || B::Evaluate(In,Out) || C::Evaluate(In,Out) || D::Evaluate(In,Out);
}
};
// Parses all the specified items.
template <class A,class B=RTrue,class C=RTrue,class D=RTrue> struct And
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
FParseInMark Mark(In);
Conjunction<NNode> Q;
if( A::Evaluate(In,Q) && B::Evaluate(In,Q) && C::Evaluate(In,Q) && D::Evaluate(In,Q) )
{
Q.Forward(Out);
return 1;
}
Mark.Restore(In);
return 0;
}
};
// A separated list.
template <class A,class B> class SeparatedList : public Or<And<A,B,SeparatedList>,A> {};
// Integer comparison.
template <INT A,INT B> struct IsAtLeast
{
static UBOOL Evaluate (class FParseInBase& In,FParseOutBase& Out)
{
return A>=B;
}
};
That Test.ae was an experimental scripting language I was implementing in 1999-2001 -- that color scheme was trendy back then, I swear. :-)
The code shown defines metadata for the language constructs. The language went a long way down the Smalltalk "everything is an object" path, dealing with first-class metaclasses and related issues, but I ultimately abandoned it when I became familiar with advanced type systems work in Haskell, Cayenne, Coq, and other languages.
Nowadays -
I'm not a fan of implementing parsers or compilers in C++, since the code tends to be bloated by 70-80% compared to a similar implementation in a modern functional language like Haskell. Read up on Haskell parser combinators for more detail -- the resulting simplicity and directness is exemplary and it's done in a rigorous, type-safe way.