4

I'm in the middle of learning how to parse simple programs.

This is my lexer.

{
open Parser
  exception SyntaxError of string
}

let white = [' ' '\t']+

let blank = ' '


let identifier = ['a'-'z']


rule token = parse
  | white {token lexbuf} (* skip whitespace *)
  | '-' { HYPHEN }
  | identifier {
    let buf = Buffer.create 64 in
    Buffer.add_string buf (Lexing.lexeme lexbuf);
    scan_string buf lexbuf;
    let content = (Buffer.contents  buf) in
    STRING(content)
  }
  | _ { raise (SyntaxError "Unknown stuff here") }

and scan_string buf = parse
  | ['a'-'z']+ {
    Buffer.add_string buf (Lexing.lexeme lexbuf);
    scan_string buf lexbuf
  }
  | eof { () }

My "ast":

type t =
    String of string
  | Array of t list

My parser:

%token <string> STRING
%token HYPHEN

%start <Ast.t> yaml
%%

yaml:
  | scalar { $1 }
  | sequence {$1} 
  ;

sequence:
  | sequence_items {
    Ast.Array (List.rev $1)
  }
  ;

sequence_items:
   (* empty *) { [] }
  | sequence_items HYPHEN scalar {
    $3::$1

  };

scalar:
  | STRING { Ast.String $1 }  
  ;

I'm currently at a point where I want to either parse plain 'strings', i.e. some text or 'arrays' of 'strings', i.e. - item1 - item2.

When I compile the parser with Menhir I get:

Warning: production sequence -> sequence_items is never reduced.
Warning: in total, 1 productions are never reduced.

I'm pretty new to parsing. Why is this never reduced?

Seneca
  • 2,392
  • 2
  • 18
  • 33

1 Answers1

6

You declare that your entry point to the parser is called main

%start <Ast.t> main

But I can't see the main production in your code. Maybe the entry point is supposed to be yaml? If that is changed—does the error still persists?


Also, try adding EOF token to your lexer and to entry-level production, like this:

parse_yaml: yaml EOF { $1 }

See here for example: https://github.com/Virum/compiler/blob/28e807b842bab5dcf11460c8193dd5b16674951f/grammar.mly#L56

The link to Real World OCaml below also discusses how to use EOL—I think this will solve your problem.


By the way, really cool that you are writing a YAML parser in OCaml. If made open-source it will be really useful to the community. Note that YAML is indentation-sensitive, so to parse it with Menhir you will need to produce some kind of INDENT and DEDENT tokens by your lexer. Also, YAML is a strict superset of JSON, that means it might (or might not) make sense to start with a JSON subset and then expand it. Real World OCaml shows how to write a JSON parser using Menhir:

https://dev.realworldocaml.org/16-parsing-with-ocamllex-and-menhir.html

Vladimir Keleshev
  • 13,753
  • 17
  • 64
  • 93
  • 1
    I renamed `yaml` to `main`, because this is the very beginning of yaml parser and I'm already struggling with the very basics. :D The reduce problem originates from the `yaml` variant. I'll try and study realworldocaml. – Seneca Sep 29 '17 at 09:19
  • 1
    Thanks it was indeed the EOF that did it! – Seneca Sep 29 '17 at 20:12
  • Years later and I got caught by the EOF problem... Thanks for the clarification! – David Vale Jun 16 '22 at 19:35