I'm building a grammar to parse Newick trees using ParseKit for a project I'm working on, and I've gotten this far. It's based on the grammar here: http://en.wikipedia.org/wiki/Newick_format. I'd like to use a grammar for this rather than the existing clunky recursive code I have working now.
However, I'm unsure of how to specify the name and length nodes to account for either empty strings or generalized strings and numbers. I've gotten this far from the examples and on the ParseKit site as well as some skimming of the Bulding Parsers for Java book, but have missed something. Can someone point me in the right direction, please?
Current grammar:
@start = tree+;
tree = subtree ';' | branch ';';
subtree = leaf | internal;
leaf = name;
internal = '(' branchset ')' name;
branchset = branch | branchset ',' branch;
branch = subtree length;
name = *;
length = * | ':' *
Thanks!
--Possible answer:
Maybe these name and length nodes would work. Could anyone confirm?
name = Word | Quoted String;
length = ':' Number;