Extending a datatype in Haskell

Question

Haskell newbie here.

I wrote an evaluator for a minimal assembly-like language.

Now, I want to extend that language to support some syntactic sugar which, I will then compile back to use only the primitive operators. The ideia is that I do not want to touch the evaluator module again.

In the OO way of doing things, I think, one could extend the original module so to support the syntactic sugar operators, providing here the translation rules.

Other than that, I can only think of rewriting the datatype constructors in both modules so that they would not name-collide, and proceed from there, as if they were complete different things, but that implies some redundancy, for I would have to repeat (just with other names) the operators in common. Again, I think the keyword here is extend.

Is there a functional way of accomplishing this?

Thanks for taking the time to read this question.

Federico Squartini · Accepted Answer · 2011-08-01T11:20:00.707

This problem was named "the expression problem" by Phil Wadler, in his words:

The goal is to define a data type by cases, where one can add new cases to the data type and new functions over the data type, without recompiling existing code, and while retaining static type safety.

One solution to have extensible data type is to use type classes.

As an example let's assume we have a simple language for arithmetics:

data Expr = Add Expr Expr | Mult Expr Expr | Const Int

run (Const x) = x
run (Add exp1 exp2)  = run exp1 + run exp2
run (Mult exp1 exp2) = run exp1 * run exp2

e.g.

ghci> run (Add (Mult (Const 1) (Const 3)) (Const 2))
5

If we wanted to implement it in an extensible way, we should switch to type classes:

class Expr a where
    run :: a -> Int


data Const = Const Int

instance Expr Const where
    run (Const x) = x


data Add a b = Add a b

instance (Expr a,Expr b) => Expr (Add a b) where
    run (Add expr1 expr2) = run expr1 + run expr2


data Mult a b = Mult a b

instance (Expr a, Expr b) => Expr (Mult a b) where
    run (Mult expr1 expr2) = run expr1 * run expr2

Now let's extend the language adding subtractions:

data Sub a b = Sub a b

instance (Expr a, Expr b) => Expr (Sub a b) where
    run (Sub expr1 expr2) = run expr1 - run expr2

e.g.

ghci> run (Add (Sub (Const 1) (Const 4)) (Const 2))
-1

For more info on this approach, and in general on the expression problem, check Ralf Laemmel's videos 1 and 2 on Channel 9.

However, as noticed in the comments, this solution changes the semantics. For example lists of expressions are no longer legal:

[Add (Const 1) (Const 5), Const 6] -- does not typecheck

A more general solution using coproducts of type signatures is presented in the functional pearl "Data types a la carte". See also Wadler's comment on the paper.

This changes the meaning of the program, ie, in that [Expr] is not the same as Expr a => [a] — alternative, Jul 31 '11 at 18:44
@monadic: Too late. At first I didn't understand what you're saying and so went on tinkering with it. When the time came to evaluate a list of instructions, and it didn't typecheck, your comment immediately made sense to me. Am I back to square one? I can only remember using a list of Either's, but that is not practical at all! Anyway, thanks for the anticipated insight. I simply failed to understand it on time. — Seymour Kooze, Aug 01 '11 at 00:21
@Seymour Kooze use an existential type. `data ExprE = forall a. Expr a => a` and then `[ExprE]` exhibits the old behavior — alternative, Aug 01 '11 at 12:04
@monadic: I was trying your last suggestion but I end up needing to specify a constructor for ExprE. In particular, I can't run your snippet unless I write it as `data ExprE = forall a. Expr a => E a`. Can it be avoided, as in your example? — Seymour Kooze, Aug 01 '11 at 15:40
@Seymour that was just a typo on my part... But you can encode to a rank 2 type somehow, I think. — alternative, Aug 01 '11 at 19:59

score 7 · Answer 2 · answered Aug 01 '11 at 09:19

You could do something a bit more OOP-like using existential types:

-- We need to enable the ExistentialQuantification extension.
{-# LANGUAGE ExistentialQuantification #-}

-- I want to use const as a term in the language, so let's hide Prelude.const.
import Prelude hiding (const)

-- First we need a type class to represent an expression we can evaluate
class Eval a where
  eval :: a -> Int

-- Then we create an existential type that represents every member of Eval
data Exp = forall t. Eval t => Exp t

-- We want to be able to evaluate all expressions, so make Exp a member of Eval.
-- Since the Exp type is just a wrapper around "any value that can be evaluated,"
-- we simply unwrap that value and call eval on it.
instance Eval Exp where
  eval (Exp e) = eval e

-- Then we define our base language; constants, addition and multiplication.
data BaseExp = Const Int | Add Exp Exp | Mul Exp Exp

-- We make sure we can evaluate the language by making it a member of Eval.
instance Eval BaseExp where
  eval (Const n) = n
  eval (Add a b) = eval a + eval b
  eval (Mul a b) = eval a * eval b

-- In order to avoid having to clutter our expressions with Exp everywhere,
-- let's define a few smart constructors.
add x y = Exp $ Add x y
mul x y = Exp $ Mul x y
const   = Exp . Const

-- However, now we want subtraction too, so we create another type for those
-- expressions.
data SubExp = Sub Exp Exp

-- Then we make sure that we know how to evaluate subtraction.
instance Eval SubExp where
  eval (Sub a b) = eval a - eval b

-- Finally, create a smart constructor for sub too.
sub x y = Exp $ Sub x y

By doing this, we actually get a single extendable type so you could, for example, mix extended and base values in a list:

> map eval [sub (const 10) (const 3), add (const 1) (const 1)]
[7, 2]

However, since the only thing we now can know about Exp values is that they are somehow members of Eval, we can't pattern match or do anything else that isn't specified in the type class. In OOP terms, think of Exp an exp value as an object that implements the Eval interface. If you have an object of type ISomethingThatCanBeEvaluated, obviously you can't safely cast it into something more specific; the same applies to Exp.

and that actually (+/-) solves the issue monadic noticed when using Federico's strategy. Thanks! — Seymour Kooze, Aug 01 '11 at 11:12
One thing to note is that since the only thing you can do to an `Exp` is `eval` it, you might as well just deal with `Int` directly. In general, an existential can be replaced with a record of the available operations (partially) applied to the value in question. — hammar, Aug 01 '11 at 13:48
@hammar I wouldn't say "in general". What would you do, e.g., for an existential of `Num`? — augustss, Aug 01 '11 at 16:35
@augustss: Ah, of course. It does not work when the type class contains functions that return the same type, as you could then perform arbitrary combinations of additional operations on it from the type class. It works as long as all the class functions take a single argument of the existential type and return some non-existential type. Thanks for the correction. — hammar, Aug 01 '11 at 17:14

score 3 · Answer 3 · answered Jul 31 '11 at 13:47

3

Syntactic sugar is usually handled by a parser; you'd extend (not in the sense of OO inheritance) the parser to detect the new constructs and translate them to the kind of structures that your evaluator can handle.

answered Jul 31 '11 at 13:47

Fred Foo

355,277
75
744
836

I see. In that case, I will bite the bullet and restructure what I've done so far. Thanks for the insight. – Seymour Kooze Jul 31 '11 at 13:52
4

Syntax is normally represented with abstract syntax trees. It should be perfectly possible to define a "sugared" AST in terms of the "desugared" AST (the former as an extension of the latter). If I want to add just a new kind of string literal, I should be able to keep a _new_ kind of string literal in the AST (not a translated version!) without copying or changing my data structures much. I want it e.g. to produce semantic errors in terms of a sugared syntax. – n. m. could be an AI Jul 31 '11 at 14:07

Pierre Carbonnelle · Answer 4 · 2018-03-02T14:08:10.570

A (simpler) option is to add a type to your AST, to distinguish Core from Extended:

data Core = Core
data Extended = Extended

data Expr t 
  = Add (Expr t) (Expr t)
  | Mult (Expr t) (Expr t)
  | Const Int 
  | Sugar t (Expr t) (Expr t)

An expression is either Core or Extended: the compiler will ensure that it contains only sub-expressions of the same type.

The function signatures in your original module would need to use Expr Core (instead of just Expr)

A Desugar function would have the following type signature:

Desugar :: Expr Extended -> Expr Core

You may also be interested in the more sophisticated approach described in the paper 'Trees that grow'.

Extending a datatype in Haskell

4 Answers4

Linked