3

I have the following very simple (test) grammar file

@start = expression+;
expression = keyword | otherWord;
otherWord = Word;
keyword = a | the;
a = 'a';
the = 'the';

Then I run the following code:

// Grammar contains the contents of the above grammar file.
PKParser *parser = [[PKParserFactory factory] parserFromGrammar:grammar assembler:self];
NSString *s = @"The parrot";
[parser parse:s];
PKReleaseSubparserTree(parser);

And the following methods:

- (void)didMatchA:(PKAssembly *)a{
    [self log:a type:@"didMatchA          "];
}
- (void)didMatchThe:(PKAssembly *)a{
    [self log:a type:@"didMatchThe        "];
}
- (void)didMatchKeyword:(PKAssembly *)a{
    [self log:a type:@"didMatchKeyword    "];
}
- (void)didMatchExpression:(PKAssembly *)a{
    [self log:a type:@"didMatchExpression "];
}
- (void)didMatchOtherWord:(PKAssembly *)a{
    [self log:a type:@"didMatchOtherWord  "];
}

-(void) log:(PKAssembly *) assembly type:(NSString *) type{
    PKToken * token = [assembly top];
    NSLog(@"Method: [%@], token: %@, assembly: %@", type, token, assembly);
}

And finally I get these messages in the log:

[1] Method: [didMatchThe        ], token: The, assembly: [The]The^parrot
[2] Method: [didMatchKeyword    ], token: The, assembly: [The]The^parrot
[3] Method: [didMatchOtherWord  ], token: The, assembly: [The]The^parrot
[4] Method: [didMatchExpression ], token: The, assembly: [The]The^parrot
[5] Method: [didMatchExpression ], token: The, assembly: [The]The^parrot
[6] Method: [didMatchOtherWord  ], token: parrot, assembly: [The, parrot]The/parrot^
[7] Method: [didMatchExpression ], token: parrot, assembly: [The, parrot]The/parrot^

This sort of makes sense, but I cannot see why %5 occurs. I'd really like to be able to remove the double matching so that keywords such as "The" only trigger didMatchThe and not didMatchKeyword.

Unfortunately the doco on parsekit seems to be non-existant on its grammar syntax and how it decides to trigger methods. Yes, I've trolled the source code too :-)

Has anyone got experience with parsekit and can shed some light on this?

broomba
  • 109
  • 7
drekka
  • 20,957
  • 14
  • 79
  • 135

1 Answers1

2

I'm the developer of ParseKit, and this is actually correct behavior. Here's a few items to help clear this up:

  1. The best way to learn about how ParseKit works is to buy "Building Parsers with Java" by Steven John Metsker. ParseKit is based almost entirely on the designs laid out there.

  2. ParseKit's parser component is extremely dynamic and features Infinite look-ahead. This makes it ideal for quick development or easily parsing small input, but it also means ParseKit exhibits extremely poor performance when parsing large documents.

  3. Due to ParseKit's infinite look-ahead, the assembler methods you implement will be called many times. Actually, it will appear they will be called too many times as you've described above. This is normal. ParseKit is exploring every possible parse path available to it at any time, so you get "too many" callbacks.

  4. The answer is to never work on ivars in your assembler callback methods. In your Assembler methods, you should instead always keep the state of what you are working on in the current PKAssembly's target ivar.

    a.target

The current PKAssembly is the one passed into your callback method.

Hope that helps.

Todd Ditchendorf
  • 11,217
  • 14
  • 69
  • 123
  • Thanks for getting back so fast Todd. Luckily the documents I want to parse are quite small. In the parsekit test code I noticed a lot of popping and pushing on the stack, but I could not see why. By keeping state do you mean to set the target as a wat for various callbacks to tell each other that they have occured. Ie. Both the didMatchWord and didMatchThe check target to see if the other has already populated and act accordingly? – drekka Jun 05 '11 at 02:14
  • PS. Is there a way I could optimize this grammer so that there are less callbacks? I'd rather not buy a book in order to achieve something which appears to be a simple case. Although the book would be a interesting read, it's not where I want t spend my energy right now. I was hoping that with a little work I could inject parsekit into the app. :-) – drekka Jun 05 '11 at 02:17
  • Been doing some more playing with this. I've been attempting to use the test code you wrote as a model. So I've been popping and pushing values. In addition I've been using target as a means to track state and know when other methods have been called. This appears to be working. Not sure at this stage where this is going. – drekka Jun 05 '11 at 12:58
  • Hi Derek, glad that helped. 1. No, you cannot practically adjust the grammar to force fewer callbacks. This is just a tradeoff you must accept with ParseKit: Extreme dynamism but poor performance. :( 2. I meant that you should actually store the data your are building/working on as the assembly's target. – Todd Ditchendorf Jun 05 '11 at 19:38