4

I'm working on a simple C application and i had the idea of creating a DSL to define some behaviors of the application. The idea is to create a very clean language, similar to Ruby, but that is actually run in C. All the functions are defined in C, the DSL is just... well, an alias to "hide" the verbose syntax of C.

I know lex and yacc but i think they are overkill for what i'm trying to do. Isn't there something simpler? I thought about regular expressions, but i would feel dirty doing that. Maybe theres something better!

An example:

if a = b
    myFunctionInC()

get 'mydata' then
    puts 'Hello!'

Easily translates to:

if (a == b) {
    myFunctionInC();
}

void get(string test)
{
    printf('Hello! %s', test);
}
vinnylinux
  • 7,050
  • 13
  • 61
  • 127
  • 1
    In case you want to support any remotely complex expression, I don't think there's an easy way to get around a lexer / parser before the "codegen". – Ivo Wetzel May 08 '12 at 18:27
  • If you can define your simple language as *Regular* regular expression will work. – Shiplu Mokaddim May 08 '12 at 18:27
  • You might want to mention what kind of DSL you want and what problem you expect it to solve; maybe there's something similar in existence or better solutions. – Ambroz Bizjak May 08 '12 at 20:18
  • 1
    In engineering, there isn't any "feeling dirty". There's adequate, and inadequate. Regexes are very likely to be inadequate. – Ira Baxter May 08 '12 at 20:32
  • For simple to parse and implement stuff I'd be looking at things like `Lisp/Scheme` and `Forth`. But those languages probably aren't "clean" in your understanding. – Alexey Frunze May 09 '12 at 02:14

4 Answers4

3

creating a DSL to define some behaviors of the application. The idea is to create a very clean language, similar to Ruby, but that is actually run in C.

C isn't a good host for embedded languages. It is a good implementation language for language runtimes, so if you want to script your application, consider doing what others do, and link a high level language to your app.

Languages such as Lua are designed for this purpose -- easier to write than C; but with simple embedding in C. You can also call C from Ruby or Python or Haskell or whatever.

Reusing an existing language is a good idea, since someone else has already done the hard work. You can reuse libraries as well.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
1

I think that if you want to create a good language you cannot rely only on regular expression, becouse the are to poorly expressive.

It will also be difficult to write regular expression to match complex pattern.

If you just want to hide some verbosity of the C language, you can use the C MACRO

Aslan986
  • 9,984
  • 11
  • 44
  • 75
1

Defining a good DSL syntax is hard; you have to understand what problems you want it to solve (and which ones you don't, otherwise it ends up with everything in it including the kitchen sink), and you have to figure out how to translate it to a target language (or interpret it on the fly).

In both cases you need a parser, and interesting DSL syntax isn't generally practical to parse with regexes. So you need a real parser generator. If you are going to tackle something like Ruby, you'll need a strong parser generator!

Then you need to capture the result of the parse, as some data structure, typically a tree. Then you need to analyze your DSL code for special cases, optimizations, and figure how to generate code. What this all means is that a parser typically is not enough. See my extended discussion on Life After Parsing.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Do you recommend something simpler than lex/yacc? What about Ragel? – vinnylinux May 08 '12 at 20:18
  • "Simpler" than a parser generator is either "no parser generator" or some kind of regex engine. I think Ragel is a regex engine. Such engines are not powerful enough to process interesting DSLs. One sometimes get by with a top-down recursive parser for a simple-enough DSL; as a practical matter, simple DSLs tend to get complicated fast as you discover what they really have to say or do. – Ira Baxter May 08 '12 at 20:30
  • 1
    ... if you insist on "no parser generator" but want something stronger than regex, consider a top down-parser. See this http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 for details. But, read the provided link as to why that isn't enough. – Ira Baxter May 08 '12 at 20:43
  • 1
    Ragel is a state machine compiler, equivalent to regex with arbitrary actions. It is not PEG. See http://www.complang.org/ragel/ – Ira Baxter May 08 '12 at 23:13
1

I'm working on a simple C application and I had the idea of creating a DSL to define some behaviors of the application. The idea is to create a very clean language... that is actually run in C.

You are not the first to have this idea. John Ousterhout made the idea popular with Tcl/Tk. Unfortunately this language is not very clean.

The most clean realization of this idea available today is the embedded language Lua. It is very well engineered and I recommend it very highly. The only reason to build your own (instead of going with Lua) is because you want to learn how to implement an embedded programming language. In that case you can still learn a lot by studying Lua's design.

I know lex and yacc but i think they are overkill for what i'm trying to do. Isn't there something simpler?

It is almost always simpler to write a lexer by hand than to use lex.

Yacc is another story—there is not really anything fundamentally simpler underneath, because you really do have to deal with the full power of context-free languages. But you can find this sophisticated technology in other packages (Lex and yacc are 1970s technologies designed for 1970s hardware constraints, and they present poor human interfaces.)

  • If you know how to design an LL(1) grammar, than a handwritten recursive-descent parser is very simple to write and requires no extra technology. But the knowledge is not so easy to acquire, and coding these things in C is not much fun.

    If you want to learn, there are excellent examples in books by Niklaus Wirth. There may also be tutorials in LL(1) and recursive descent online.

  • You might find it simpler to use a more modern parser generator not limited to LALR(1) grammars. Perhaps the Elkhound parser generator, for example. But this is not simple either.

Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533