EDSL with name binding in C++

Question

Is it possible to write an edsl in C++ which binds values to variable names? For example, I can write a edsl in Haskell that allows me to write the following (see also this question):

prog3 :: StackProg Expr
prog3 = do
  push (IntL 3)
  push (IntL 4)
  a <- pop
  b <- pop
  return (Plus a b)

And this produces an AST where a and b are variables. Is something similar possible in C++? I would like to have (in order of importance)

acceptable syntax for the resulting edsl,
a sensible AST
type safety in the object (dsl) language
understandable mechanics.

Sure? Couldn't `pop` just return fresh variable node? You're not going to get any reasonable encoding of HOAS or anything, but I think what you've shown here is entirely within C++'s capabilities. Except your four bullet points, all of which any program in C++ will lack (BURN!) — luqui, Feb 21 '13 at 09:27
@luqui I don't really understand this comment. Yeah, pop returns a fresh variable node. So how about the C++ side? — Paul, Feb 21 '13 at 09:28
you mean how is `pop` implemented? I dunno, increments a counter, allocates an object, and returns it... Maybe I'm misunderstanding your question. What kind of monad is `StackProg`? If it's a combination of the imperative-simulable ones: `Reader`, `State`, `Writer`, `IO`, then this code has a transliteration to imperative code. — luqui, Feb 21 '13 at 09:29
No, I mean how can I write something like that that is valid C++ and produces a syntax tree? I am also quite sure that you could do it, for one you could pretty much translate the Haskell implementation, but I think that would be a shitload of work and it would very likely be ugly (I guess you might not get point 3 that way). So how to best do it? — Paul, Feb 21 '13 at 09:34
`stack.push(new IntL(3)); stack.push(new IntL(4)); a = stack.pop(); b = stack.pop(); return new Plus(a, b);` — luqui, Feb 21 '13 at 09:36
StackProg is a Free monad. The goal is to get an AST **of all the actions** that can be interpreted. Maybe that wasn't clear enough. For more context [see also this question](http://stackoverflow.com/q/14904048/346587). AFAICT your version returns a `Plus` node with the variables, but not the actions that manipulate the stack. — Paul, Feb 21 '13 at 09:37
`stack` doesn't have to be a real stack. Instead of actually doing something, it can log what it would have done. You couldn't use C++ level loops/conditionals anymore, you'd have to do something like `stack.If(...); ...; stack.endif();`. — luqui, Feb 21 '13 at 09:39
Oh! I see. I think I lost the connection between the stack and the node. Probably still salvagable with `return stack.Return(new Plus(a,b))`. Whether that matches your bullet points is a matter of taste. — luqui, Feb 21 '13 at 09:43
@luqui hm, this might even work... the crux is that this is a toy example of course, but if it works for the toy example maybe it works for the real thing also. C++ conditionals should be replaced by the DSL version anyway, as I want to interpret those too, so that's not a real problem. — Paul, Feb 21 '13 at 09:43
Yeah, we're talking about C++ here, after all, so if that works it is definitely good enough in terms of syntax prettiness. For type safety I think you'd need something like GADTs, not sure if that is possible with some template magic. I am still not 100% sure how this would work exactly, but it looks like a promising approach. — Paul, Feb 21 '13 at 09:51
The closest thing C++ has is placeholders, which work by convention. E.g. with Boost.Phoenix an expression of the EDSL like `arg1 + arg2` means 'a polymorphic functor yielding the sum of its two operands'. In the particular case of Boost.Phoenix they've gone to great pains to provide `_a`, `_b` etc. placeholders so that the EDSL lets you do e.g. `let(_a = arg1, _b = 42) [_a + _b]`. — Luc Danton, Feb 21 '13 at 17:39
@LucDanton This sounds interesting. I thought before that boost.proto might be the way to go here (I think boost.phoenix is based on that), but I don't know the first thing about it. If you elaborate a little bit I'd accept it as an answer. I've been looking into [this tutorial](http://cpp-next.com/archive/2010/08/expressive-c-introduction/) on boost proto but haven't made it very far yet. — Paul, Feb 21 '13 at 17:48
I've been considering making an actual answer, I may find the time to do so later (on the plus side this means I get to think a little bit longer about it). — Luc Danton, Feb 21 '13 at 17:49
@LucDanton that would be awesome! I understand that answering is always time consuming... On the upside I don't need a worked example, if you could give me a pointer along the lines of *you can try this, this works, this doesn't, these might be drawbacks, look here for more info* then that would be super helpful! — Paul, Feb 21 '13 at 17:54

score 2 · Answer 1 · edited May 23 '17 at 10:32

2

No, you can't directly model that with C++. However, in C++ you can embed other engines, such as Lua, which is often used as an extension engine, and is suitable for DSL's.

Check out this answer, also in StackOVerflow.

Steps to embed Lua in C++:

edited May 23 '17 at 10:32

Community

1
1

answered Feb 21 '13 at 09:07

Baltasarq

12,014
3
38
57

Thank you for answering! I am not sure if that is what I want ... does it allow me to get and process the AST on the C++ side? Also, afaik Lua is dynamically typed, so no type safety in the DSL, right? – Paul Feb 21 '13 at 09:26
Yes, Lua is dynamic, and yes, it has not special abilities to read/process an AST (afaik). Python has them, though it is also dynamic. You can embed Python in C++ very easily. http://docs.python.org/3.3/extending/embedding.html – Baltasarq Feb 22 '13 at 08:48
Well, that is not really what I want. My primary goal is to obtain a AST as a datastructure in C++. I think the easiest way to do this is to write it in C++ in some way. Not saying that it is easy, but easiest in the sense that I don't need to write a parser and typechecker. – Paul Feb 22 '13 at 09:34
Using just C++ you will have to write a parser and a typechecker, being the language dynamic or static won't help you on that. There is not a lexer/parser in the standard C++ library, yet it does exist in e.g. Python and you can embed it. No, pure C++ won't make your life easier... probably, it'll make thing worse. – Baltasarq Feb 22 '13 at 11:38

score 2 · Accepted Answer · edited May 23 '17 at 11:49

If you want to produce syntaxes of valid C++ expressions (possibly as a subset of all C++ expressions, much like do-notation limits itself to monad operations), validate them statically and consume them, then your best bet is Boost.Proto. To describe it succinctly, it is itself an EDSL to write and describe EDSLs.

I won't go in any more detail on how to use it. Although it may be hard to learn to use, especially so if you're not accustomed to C++ metaprogramming, the documentation is great and if you've ever written a grammar I believe you will find your marks. In another of my answers I walked someone on how to write an EDSL with a grammar that only accepts simple arithmetic expressions and that consumes them to compute their derivative, so you may want to check that.

As for your exact question, I'm afraid the answer has to be either a short "no, you can't do that", or a long "you can do it to an extent as Boost.Phoenix shows, but it's probably not worth your time implementing it considering the cryptic errors and/or extra compilation time for your EDSL users". My reasoning for this is what you want to do fits on two levels: do-notation is a Haskell specific feature, while consuming a syntax tree and giving semantics to it in on the level of the EDSL itself.

As it so happens, typical Proto-style EDSL are valid C++ expressions, and the language doesn't offer scoping at that level, variables are declared in separate statements. For instance, _a + _b is a valid C++ fragment because _a and _b are declared C++ variables provided by Phoenix, but not a valid program in the EDSL because _a and _b are left unbound. Yes, the error will be caught, but you have to implement that yourself. In comparison do-notation is part of Haskell so any EDSL inherits it for free. That is to say, return (a + b) is never valid on its own -- there needs to be some a and some b.

There are some things to keep in mind though. C++11 offers lambda expressions, so you can in fact get some scoping here -- but those will be opaque in the EDSL, the syntax tree will only show a variable. Some introspection might reveal that that variable is callable for some types but that's it. Even if you require that the lambdas return a value in the EDSL there's no telling what else they could do. That's not always worth worrying about and I'd say that it's perfectly appropriate for some EDSLs.

Similarly C++11 makes it much more easy to 'factor out' parts of an EDSL expression. That's not so much equivalent to the a <- foo sugar of do-notation but it is equivalent to let a = foo. So you could in fact without much difficulty makes the following do 'the right thing':

auto double_pop = make_tuple(pop(), pop());
auto program = (push(3), push(4), consume(double_pop));

Which could be equivalent to the unidiomatic and contrived following:

program = do
  let a = pop
  return consume `ap` a `ap` a

(Since Boost.Proto started as a C++03 library make sure to pore over the docs before using C++11's auto with it, IIRC there is a caveat.)

Hi, thank you very much for your answer. I'll have to check out boost proto to see if it really fits the bill... my initial impression is yes. Note that I am not looking for an EDSL that non-programmers would have to be able to use, it is more for finding something that makes what I want to do as expressive as possible, but compromises are ok (cryptic type errors for example would probably be acceptable). Anyway thanks a lot for your answer, I am curious what boost proto has to offer. — Paul, Feb 23 '13 at 11:59

EDSL with name binding in C++

2 Answers2