2

In bison, it is sufficient to add

%verbose-error 

to the file to make the parser errors more verbose. Is there any way to gain similar functionality with ocamlyacc?

Here is the answer for a similar question, but I could not make anything out of it. This is how I call the lexer and parser functions:

let rec foo () =
    try
    let line = input_line stdin in
    (try
       let _ = (Parser.latexstatement lexer_token_safe (Lexing.from_string line)) in
         print_string ("SUCCESS\n")
     with
           LexerException s          -> print_string ("$L" ^ line ^ "\n")
         | Parsing.Parse_error       -> print_string ("$P" ^ line ^ "\n")
         | _                         -> print_string ("$S " ^ line ^ "\n"));
    flush stdout;
    foo ();
    with
    End_of_file -> ()
;;
foo ();;
Community
  • 1
  • 1
osolmaz
  • 1,873
  • 2
  • 24
  • 41
  • 5
    FWIW, whenever somebody asks an ocamlyacc question, a knowedgeable respondent always suggests they use [Menhir](http://gallium.inria.fr/~fpottier/menhir/) instead. I just looked through the error-handling section of the Menhir manual and don't see anything like what you're asking for. But it does seem to be nicer than ocamlyacc. – Jeffrey Scofield Dec 26 '12 at 21:42
  • I suppose it is a design philosophy. ocamlyacc can't be an LR parser generator, since yacc is not one. It is, however, default in the ocaml package, and sometimes convenience wins over functionality. – osolmaz Dec 26 '12 at 23:06

2 Answers2

11

I don't think that there's an option in ocamlyacc to do what you want automatically, so let me try to provide below a through description of what could be done to handle syntactic errors and have more useful messages. Maybe it is not what you asked.

Errors must actually be separated in lexical and parse errors, depending on which stage of the parsing process the error happens in.

  • In mll files, a Failure exception will be raised in case of unexpected patterns
  • in mly files, it's a Parsing.Parse_error exception which will be generated

So you have several solutions:

  • let the lexer and parser code raise their exceptions, and catch them in the code calling them
  • implement the specific cases of errors in the either of them with
    • a catch all rule for the lexer (or some more specific patterns if necessary)
    • using the error special terminal in the parser rules to catch errors in specific places

In any case, you will have to make functions to get information about the position of the error in the source. Lexing and Parsing both use a location record, defined in Lexing, with the following fields:

  • pos_fname : the name of the file currently processed
  • pos_lnum : the line number in the file
  • pos_bol : the character number from the start of the file at the beginning of the line
  • pos_cnum : the character number at the current position

The lexbuf variable used by the lexer has two values like that to track the current token being lexed (lexeme_start_p and lexeme_curr_p in Lexing let you access these data). And the parser has four to track the current symbol (or non-terminal) about to be synthetized, and the current rule items, which can be retrieved with Parsing functions (rhs_start_pos and rhs_end_pos, as well as symbol_start_pos and symbol_end_pos).

Here's a few functions to generate more detailed exceptions:

exception LexErr of string
exception ParseErr of string

let error msg start finish  = 
    Printf.sprintf "(line %d: char %d..%d): %s" start.pos_lnum 
          (start.pos_cnum -start.pos_bol) (finish.pos_cnum - finish.pos_bol) msg

let lex_error lexbuf = 
    raise ( LexErr (error (lexeme lexbuf) (lexeme_start_p lexbuf) (lexeme_end_p lexbuf)))

let parse_error msg nterm =
    raise ( ParseErr (error msg (rhs_start_p nterm) (rhs_end_p nterm)))

and some basic use case:

parser: %token ERR

/* ... */

wsorword:
    WS                 { $1 }
  | WORD            { $1 }
  | error             { parse_error "wsorword" 1; ERR "" } /* a token needed for typecheck */
;

lexer:

rule lexer = parse
(*  ... *)
(* catch all pattern *)
| _                      { lex_error lexbuf }

All that would be left to do is to modify your top level function to catch the exceptions and process them.

Finally, for debugging purposes, there is a set_trace function available in Parsing which enable the display messages of the state machine used by the parsing engine: it traces all the internal state changes of the automaton.

Dan Olson
  • 22,849
  • 4
  • 42
  • 56
didierc
  • 14,572
  • 3
  • 32
  • 52
  • 1
    This is an **awesome** answer. Thank you so much for going into this detail. Quick question - Does Menhir have "better" support for catching parser errors? Is it worth upgrading from OCamlyacc just for this? Thanks! – Prakhar Feb 15 '16 at 18:42
-1

In the Parsing module (you can check it here) there is the function Parsing.set_trace that will do just that. You can use it as:Parsing.set_trace True to enable. Also, you can run ocamlyacc with the -v argument and it will output a .output, listing all states and trasitions.

Hydrocat
  • 95
  • 1
  • 7