3

I'm building a grammar to parse Newick trees using ParseKit for a project I'm working on, and I've gotten this far. It's based on the grammar here: http://en.wikipedia.org/wiki/Newick_format. I'd like to use a grammar for this rather than the existing clunky recursive code I have working now.

However, I'm unsure of how to specify the name and length nodes to account for either empty strings or generalized strings and numbers. I've gotten this far from the examples and on the ParseKit site as well as some skimming of the Bulding Parsers for Java book, but have missed something. Can someone point me in the right direction, please?

Current grammar:

@start = tree+;
tree = subtree ';' | branch ';';
subtree = leaf | internal;
leaf = name;
internal = '(' branchset ')' name;
branchset = branch | branchset ',' branch;
branch = subtree length;
name = *;
length = * | ':' *

Thanks!

--Possible answer:

Maybe these name and length nodes would work. Could anyone confirm?

name = Word | Quoted String;
length = ':' Number;
Chris F.
  • 773
  • 6
  • 15

1 Answers1

2

Developer of ParseKit here. Your proposed solution at the end is basically correct with one small fix: QuotedString is one word:

name = Word | QuotedString;
length = ':' Number;

Also for future reference: if you would like a 'Wildcard' matcher (what you are trying to do with * above), you can use the builtin parser: Any. That will match any token.

In ParseKit, * is a modifier meaning zero or more.

Todd Ditchendorf
  • 11,217
  • 14
  • 69
  • 123
  • 1
    Excellent, thanks! Now, if only I could get ParseKit to work in XCode 4.2 on Lion. I have a ton of issues setting the dependencies and linking. Basically, I do this: 1. check out 1.5 tagged release 2. drag the project into my frameworks folder 3. add ParseKit as target dependency 4. add ParseKit.framework as a library link 5. attempted compile fails http://pastie.org/2805285 – Chris F. Nov 03 '11 at 16:31
  • looks like there's an issue with a missing format string in the dependency sub-library RegexKit. Check this file/line: RegexKitLite.m:894. Also, can you please upvote my answer above since it seems to have solved that issue? Thx. – Todd Ditchendorf Nov 03 '11 at 16:52
  • 1
    Yeah, I commented that line out and it compiled. I tried to upvote, but my reputation is not high enough. – Chris F. Nov 03 '11 at 17:20
  • Sorry, I guess I mean 'select as answer'. Thx for doing that. – Todd Ditchendorf Nov 03 '11 at 17:23
  • No problem. I have the parser set up and parsing the newick string I'm sending, however, now getting a warning "Unable to restore previously selected frame". More details here. The code is so simple, I"m surely doing something dumb. http://pastie.org/2805591 – Chris F. Nov 03 '11 at 17:33
  • I added some output to that pastie, but it's (A:0.1,B:0.2,(C:0.3,D:0.4):0.5); – Chris F. Nov 04 '11 at 12:44
  • I found the problem in your grammar, however even after fixing the problem, i don't think your grammar is what you want. Full details here: http://pastie.org/2812186. Sorry, ParseKit does not have good error reporting on this stuff. The Metsker book has full details on the error you had in your grammar. look for 'left recursion' in the book I think. – Todd Ditchendorf Nov 04 '11 at 21:35