1

I have a frontend written in menhir which tries to parse an expression: from a string to an expression AST. The entry point of the frontend Parser_e.main is called in several different places in my OCaml code. So I would like to be able to catch possible errors inside the frontend rather than outside. When catching an error, a particular important information I want to show is the entire input string that the frontend cannot parse. (Errors from the lexer are very rare, because the frontend can almost read everything).

So I tried to follow this thread, and to print more information when there is an error. In parser_e.mly, I have added

exception LexErr of string
exception ParseErr of string

let error msg start finish  = 
  Printf.sprintf "(line %d: char %d..%d): %s" start.pos_lnum 
       (start.pos_cnum - start.pos_bol) (finish.pos_cnum - finish.pos_bol) msg

let parse_error msg nterm =
  raise (ParseErr (error msg (rhs_start_pos nterm) (rhs_end_pos nterm)))

e_expression:
/* empty */ { EE_empty }
| INTEGER { EE_integer $1 }
| DOUBLE { EE_double $1 }
...
| error { parse_error "e_expression" 1; ERR "" }

But it still does not have the input string as information. Does anyone if there is any function I am missing to get that?

Community
  • 1
  • 1
SoftTimur
  • 5,630
  • 38
  • 140
  • 292

1 Answers1

2

In the context of an error you can extract a location of failed lexeme in a format of two positions, using Parsing.symbol_start_pos and Parsing.symbol_end_pos functions. Unfortunately Parsing module doesn't really provide an access to the lexeme as a string, but if the input was stored in file then it is possible to extract it manually or print an error in a compiler style, that a descent IDE will understand and highlight it manually. A module Parser_error is below. It defines function Parser_error.throw that will raise an Parser_error.T exception. The exception caries a diagnostic message and a position of a failed lexeme. Several handy functions are provided to extract this lexeme from a file, or to generate a fileposition message. If your input is not stored in a file, then you can use string_of_exn function that accepts the input as a string and the Parser_error.T exception, and extracts the offending substring from it. This is an example of a parser that uses this exception for error reporting.

open Lexing

(** T(message,start,finish) parser failed with a [message] on an 
    input specified by [start] and [finish] position.*)
exception T of (string * position * position)

(** [throw msg] raise a [Parser_error.T] exception with corresponding
    message. Must be called in a semantic action of a production rule *)
let throw my_unique_msg =
  let check_pos f = try f () with _ -> dummy_pos in
  Printexc.(print_raw_backtrace stderr (get_raw_backtrace ()));
  let sp = check_pos Parsing.symbol_start_pos in
  let ep = check_pos Parsing.symbol_end_pos  in
  raise (T (my_unique_msg,sp,ep))

(** [fileposition start finish] creates a string describing a position 
    of an lexeme specified by [start] and [finish] file positions. The
    message has the same format as OCaml and GNU compilers, so it is
    recognized by most IDE, e.g., Emacs. *)
let fileposition err_s err_e =
  Printf.sprintf
    "\nFile \"%s\", line %d, at character %d-%d\n"
    err_s.pos_fname err_s.pos_lnum err_s.pos_cnum err_e.pos_cnum

(** [string_of_exn line exn] given a [line] in a file, extract a failed 
    lexeme form the exception [exn] and create a string denoting the  
    parsing error in a format similar to the format used by OCaml 
    compiler, i.e., with fancy underlying. *) 
let string_of_exn line (msg,err_s,err_e) =
  let b = Buffer.create 42 in
  if err_s.pos_fname <> "" then
    Buffer.add_string b (fileposition err_s err_e);
  Buffer.add_string b
    (Printf.sprintf "Parse error: %s\n%s\n" msg line);
  let start = max 0 (err_s.pos_cnum - err_s.pos_bol)  in
  for i=1 to start  do
    Buffer.add_char b ' '
  done;
  let diff = max 1 (err_e.pos_cnum - err_s.pos_cnum) in
  for i=1 to diff do
    Buffer.add_char b '^'
  done;
  Buffer.contents b

(** [extract_line err] a helper function that will extract a line from 
     a file designated by the parsing error exception *)
let extract_line err =
  let line = ref "" in
  try
    let ic = open_in err.pos_fname in
    for i=0 to max 0 (err.pos_lnum - 1) do
      line := input_line ic
    done;
    close_in ic;
    !line
  with exn -> !line

(** [to_string exn] converts an exception to a string *)
let to_string ((msg,err,_) as exn) =
  let line = extract_line err in
  string_of_exn line exn

Here is an example, that shows how to use in case if there is no file, and input is from a stream or interactive (shell-like) source:

let parse_command line =
  try
    let lbuf = Lexing.from_string line in
    `Ok Parser.statement Lexer.tokens lbuf
  with
  | Parsing.Parse_error -> `Fail "Parse error"
  | Parser_error.T exn -> `Fail (Parser_error.string_of_exn line exn)
ivg
  • 34,431
  • 2
  • 35
  • 63
  • Given a string line as input, your functions return the precise **substring** that raises errors, whereas I was asking how to show the entire input string. But I think my initial question is quite easy: we could just wrap an error handling around `Parser_e.main` or `Parse.statement` and always call the wrapper... Good to know your example and module which is more precise... – SoftTimur Jul 22 '16 at 12:38
  • This wouldn't be possible, as the parser doesn't know it by itself. At the moment of failure it finds itself in a state where there is no more transitions. The history how it ended up in this state is not stored. You can enable debugging mode, and print this history, but this is different from making a nice parser error for an end user. – ivg Jul 22 '16 at 17:55