6

Is it possible to feed an OCamlYacc-generated parser an explicit token list for analysis?

I'd like to use OCamlLex to explicitly generate a token list which I then analyze using a Yacc-generated parser later. However, the standard use case generates a parser that calls a lexer implicitly for the next token. Here tokens are computed during the yacc analysis rather than before. Conceptually a parser should only work on tokens but a Yacc-generated parser provides an interface that relies on a lexer which in my case I don't need.

Christian Lindig
  • 1,216
  • 1
  • 9
  • 24

3 Answers3

7

As already mentioned by Jeffrey, Menhir specifically offers, as part of its runtime library, a module to the parsers with any kind of token stream (it just asks for a unit -> token function): MenhirLib.Convert.

(You could even use this code without using Menhir, with ocamlyacc instead. In practice the conversion is not terribly complicated so you could even re-implement it yourself.)

gasche
  • 31,259
  • 3
  • 78
  • 100
  • That's very useful. I never used Menhir but the manual looks very convincing (I had overlooked the `Convert` module, though) and its author is well respected in the OCaml community. – Christian Lindig Jun 05 '12 at 20:00
5

If you already have a list of tokens, you can just go the ugly way and ignore the lexing buffer altogether. After all, the parse-from-lexbuf function that your parser expects is a non-pure function :

let my_tokens = ref [ (* WHATEVER *) ]
let token lexbuf = 
  match !my_tokens with 
    | []     -> EOF 
    | h :: t -> my_tokens := t ; h 

let ast = Parser.parse token (Lexbuf.from_string "")

On the other hand, it looks from your comments that you actually have a function of type Lexing.lexbuf -> token list that you're trying to fit into the Lexing.lexbuf -> token signature of your parser. If that is the case, you can easily use a queue to write a converter between the two types:

let deflate token = 
  let q = Queue.create () in
  fun lexbuf -> 
    if not (Queue.is_empty q) then Queue.pop q else   
      match token lexbuf with 
        | [   ] -> EOF 
        | [tok] -> tok
        | hd::t -> List.iter (fun tok -> Queue.add tok q) t ; hd 

let ast = Parser.parse (deflate my_lexer) lexbuf
Victor Nicollet
  • 24,361
  • 4
  • 58
  • 89
  • 1
    A yacc-generated parser provides semantic actions access to position information hidden in the `lexbuf` state. I am therefore not sure that `deflate` will work but it gives me an idea. – Christian Lindig Jun 05 '12 at 18:49
1

The OCamlYacc interface does look pretty complicated; it seems to require a Lexing.lexbuf. Maybe you could consider using Lexing.from_string to feed a fixed string rather than a fixed sequence of tokens. You could also look at Menhir. I haven't used it, but it gets excellent reviews here whenever anybody mentions OCaml parser generators. It might have a more flexible lexing interface.

Jeffrey Scofield
  • 65,646
  • 2
  • 72
  • 108
  • Menhir looks like a good alternative to OCamlYacc for many reasons. However, it also seems pretty closely tied to the lexer. `Lexing.from_string` is not an alternative because the essential problem is that some lexer actions generated two tokens rather than one because I can only recognize the token following an arbitrary string and end up with a string token and the one following it. Hence I had planned on building up a list of tokens first. Maybe I have to introduce weird hybrid tokens to work around the limitation. – Christian Lindig Jun 05 '12 at 17:00