1

I have a pretty standard recursive descent parser. The algorithm is straightforward: each function reads characters from a stream and either returns FAIL or calls subsequent parse functions (specified by a corresponding grammar rule):

function parseFoo() {
  var ret = parseBar(stream);
  if (ret == FAIL) {
    ret = parseQuax(stream);
    ...
  }
  ...
  return ret;
}

I would like to solve a situation where I don't have the full stream at once - I get its pieces asynchronously. Therefore what I need is a feature of "interruptability" - I must be able to save a state of the parser and later continue from that point on.

Sadly, this is one thing that is impossible to do with nested function calls, because all the state of a parser is "stored" on the call stack (in the order of functions and in their local variables).

So I have to construct the parse* functions differently.

The language I use is JavaScript.

Can someone point me o any ideas how to proceed?

EDITs:

  • It seems clear to me that I need some sort of state machine(s). I can't wrap my head around generators or continuations-passing-style, it looks to me that there are so many glitches in the saving of a state and resuming. To me most clear pathway I can think of is to convert the nested calls into some sort of stack, rewrite the local variables to some hashmap stored at the stack item and construct the parsing code in a different, linear fashion so I can easily "goto" to some state.

  • One of the sub-problems I see is might be this: Since I don't have the full stream I think I have to try multiple paths and store all the partially-parsed attempts. For example if the grammar says a = b | c then b might be so long that it is not fully in the stream. So I can't "hang" in parsing b, I have to try both and save the partial computations. Am I right?

tillda
  • 18,150
  • 16
  • 51
  • 70
  • 1
    Actually, this is straightforward in some languages. Which language are you using? –  Apr 17 '14 at 10:53
  • 1
    If you can rely on relatively recent language additions, you could try using [generators](http://stackoverflow.com/q/2282140/395760) to build a sort of coroutine. –  Apr 17 '14 at 10:57
  • Shouldn't `if (ret == FAIL)` actually be `if (ret != FAIL)`? – JimmyB Apr 17 '14 at 11:01
  • 1
    Depends on the grammar rule. It might be `foo = bar quax` or `foo = bar | quax`. Anyway, the code isn't some precise example at all. – tillda Apr 17 '14 at 11:03

1 Answers1

1

It depends upon the programming language used.

You basically need some support of continuations (which you could understand as the abstraction of the call stack). In other words, you may want to code in continuation passing style (CPS).

If coding in C, you might be interested by CPC.

If coding in Javascript, you'll probably need to hand-code in CPS. See also this, this, this question, this pages etc....

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • I may be biased against coroutines, but IMHO full on continuation passing style is overkill and too complex. One only needs to resume at a single point, not an arbitrary number as e.g. when backtracking. In other words, a generator, not even a coroutine, let alone a full continuation. For that, a simple-ish state machine should work just as well, with less code and indirection. –  Apr 17 '14 at 11:03
  • But CPS is easy to code in. Of course, you'll need to think of at at the start. – Basile Starynkevitch Apr 17 '14 at 11:04
  • You see, that's the part I'm biased about. I'd have to think really hard about how to map all that control flow to CPS, whereas for a state machine, I'd write generator pseudo-`yield`s and simply slap some mechanical transformations (explicit state argument, switch to find resume position, move locals into state argument) around that. Of course, someone who often writes CPS code and rarely state machines would be in the reverse position. –  Apr 17 '14 at 11:16