3

I'm looking to create a lexer and parser for a simple DSL we use with internal tools. There will be a couple in-built symbols (is this the correct term?) and they'll take anywhere between 1 and 10 arguments. For example:

foo(bar1;bar2)

There will also be symbols added at runtime that will always have zero parameter. Example:

testy1()

These will be strung together and read from a CSV file. An assembled line would like like:

foo(bar1:bar2)testy1()

I've had a hard time finding resources online that easily explain lexing and parsing function calls like these. Could someone point me in a good direction, or offer advice?

Ryan
  • 470
  • 4
  • 15
  • You may think it is overkill, but I'd download ANTLR and use its tools for this, it would make your job a lot easier once you get past the learning curve. – Ron Beyer Jul 28 '15 at 14:52
  • 1
    See my SO answer on how to build a recursive descent parser: http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 – Ira Baxter Jul 28 '15 at 17:03
  • I'm unable to have any dependencies on third party libraries, otherwise ANTLR sounds like a good option. – Ryan Aug 03 '15 at 14:59

1 Answers1

0

I've written a small parser with PegJS that is able to parse simple expressions in function calls. PEGs avoid ambiguity and they work fine for this.

Start
  = Expr

/* Expressions */

Expr
  = FuncCall
  / Literal

RestExpr
  = _ ( "," / ":" ) _ e:Expr {
    return e;
  }

/* Function call */

FuncCall
  = func:Ident _ "(" _ x:Expr? xs:RestExpr* _ ")" {
    return {
      type: "FuncCall",
      func: func.value,
      params: x ? [x].concat(xs) : []
    };
  }

/* Literals */

Literal
  = Number
  / Ident

Number
  = val:[0-9]+ {
    return {
      type: "Number",
      value: parseInt(val.join(""))
    };
  }

/* Identifier */

Ident
  = x:IdentStart xs:IdentRest* {
  return {
    type: "Ident",
    value: [x].concat(xs).join("")
  };
}

IdentStart
  = [a-z_]i

IdentRest
  = [a-z0-9_]i
_
  = [ \s\t\r\n]*

You can test the parser here: http://pegjs.org/online

An input example is foo(1, bar(2), baz(3)), where the output is:

{
   "type": "FuncCall",
   "func": "foo",
   "params": [
      {
         "type": "Number",
         "value": 1
      },
      {
         "type": "FuncCall",
         "func": "bar",
         "params": [
            {
               "type": "Number",
               "value": 2
            }
         ]
      },
      {
         "type": "FuncCall",
         "func": "baz",
         "params": [
            {
               "type": "Number",
               "value": 3
            }
         ]
      }
   ]
}

This clearly isn't the best approach, but I believe peg-sharp can do it well with C#: https://code.google.com/p/peg-sharp/.

Marcelo Camargo
  • 2,240
  • 2
  • 22
  • 51