92

I've been thinking of this question very long, but really couldn't find the answer on Google as well a similar question on Stackoverflow. If there is a duplicate, I'm sorry for that.

A lot of people seem to say that writing compilers and other language tools in functional languages such as OCaml and Haskell is much more efficient and easier then writing them in imperative languages.

Is this true? And if so -- why is it so efficient and easy to write them in functional languages instead of in an imperative language, like C? Also -- isn't a language tool in a functional language slower then in some low-level language like C?

Matt Fenwick
  • 48,199
  • 22
  • 128
  • 192
wvd
  • 1,189
  • 2
  • 12
  • 23
  • 7
    I wouldn't say it's easier. But the functional nature of compiling tasks such as parsing probably lend themselves quite naturally to functional programming. Functional languages like OCaml can be extremely fast, rivaling the speed of C. – Robert Harvey May 25 '10 at 15:28
  • 18
    Folks, is this really argumentative? Surely someone has some insight. I'd like to know myself. – Robert Harvey May 25 '10 at 15:30
  • I think there should atleast be some good reasons why to use a functional language over a imperative one. I've found some article which basically came down on that functional languages have no side-effects and such. But it wasn't really clear at all. However, if this is argumentative, then it might be better to close it or reformulate the question. – wvd May 25 '10 at 15:33
  • Can we change it to “…efficient and easy”? That makes it less argumentative. – Donal Fellows May 25 '10 at 15:44
  • 14
    Is it really argumentative to acknowledge that some niches are better suited to a particular style of language? "Why is C better than Javascript for writing device drivers" wouldn't be controversial, I'd think... – C. A. McCann May 25 '10 at 15:51
  • 1
    I thought it would be the opposite. I am reading "super tiny compiler" and it uses variable mutation all over the place. – Rolf Sep 25 '17 at 02:46

7 Answers7

107

Often times a compiler works a lot with trees. The source code is parsed into a syntax tree. That tree might then be transformed into another tree with type annotations to perform type checking. Now you might convert that tree into a tree only containing core language elements (converting syntactic sugar-like notations into an unsugared form). Now you might perform various optimizations that are basically transformations on the tree. After that you would probably create a tree in some normal form and then iterate over that tree to create the target (assembly) code.

Functional language have features like pattern-matching and good support for efficient recursion, which make it easy to work with trees, so that's why they're generally considered good languages for writing compilers.

sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • Most complete answer so far, I'll mark this as the accepted answer, however I think Pete Kirkham's answer is also good. – wvd May 25 '10 at 15:48
  • 1
    What about "prooving correctness", since correctness of a compiler is an important attribute, I have often heard that fans of functional languages incorporate a "proof" of correctness into their workflow somehow. I have no idea what that really means in practical terms, but as compiler reliability is important, this seems worthwhile. – Warren P May 25 '10 at 15:56
  • 3
    @WarrenP: The "proof-carrying code" concept comes from statically-typed functional languages. The idea is that you use the type-system in such a way so that a function can only typecheck if it's correct, so the fact the code compiles is the proof of correctness. Of course this isn't fully possible while keeping the language turing-complete and typechecking decidable. But the stronger the type-system, the closer you can get to that goal. – sepp2k May 25 '10 at 16:06
  • 3
    The reason that this concept is mainly popular in the functional community is that in languages with mutable state, you'd also have to encode information about when and where state change occurs in the types. In languages where you know that the result of a function only depends on its arguments, it's much easier to encode a proof in the types (it's also much easier to manually proof the code's correctness because you don't have to consider which global state is possible and how it will affect the behavior of the function). However none of this is specifically related to compilers. – sepp2k May 25 '10 at 16:10
  • @Warren P: Typically that means a mathematical specification of what the program should do, and a proof that the program in fact does that. In compiler terms, this would mean something like "given this definition of the source language, the output code will perform the same computations, and any optimizations will alter only the time/space usage of the program, not its behavior". – C. A. McCann May 25 '10 at 16:11
  • 4
    The single most important feature is pattern matching in my opinion. Optimizing an abstract syntax tree with pattern matching is stupidly easy. Doing it without pattern matching is often frustratingly hard. – Bob Aman May 25 '10 at 17:11
  • So you could generalise the answer to: a compiler is a function from a sentence in one language into a sentence in another language (preserving some kind of "semantic mapping" across meanings of both languages). If you view your compiler as a function it's reasonable to look to functional programming for support implementing it... – Dafydd Rees Apr 23 '11 at 10:44
44

A lot of compiler tasks are pattern matching on tree structures.

Both OCaml and Haskell have powerful and concise pattern matching capabilities.

It's harder to add pattern matching to imperative languages as whatever value is being evaluated or extracted to match the pattern against must be side-effect free.

Chris Conway
  • 55,321
  • 43
  • 129
  • 155
Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171
  • Sounds like a reasonable answer, but is this the only thing? e.g. would things such as tail recursion also play a role? – wvd May 25 '10 at 15:40
  • That would seem to indicate that it is more of an issue of the type system than of the actual execution model. Something based on imperative programming with immutable values over structural types might be fine. – Donal Fellows May 25 '10 at 15:43
  • @wvd: Tail recursion optimization is an implementation detail, not a language feature as such, that makes linear recursive functions equivalent to an iterative loop. A recursive function to walk a linked list in C would benefit from it just as much as recursing on a list in Scheme does. – C. A. McCann May 25 '10 at 15:44
  • @wvd gcc C has tail call elimination, as do other mutable state languages – Pete Kirkham May 25 '10 at 15:44
  • @wvd: I would say no on the tail recursion, in the end it is only an optimization and lots of lisps are great for writing compilers in, and only a subset of those promise TCO. – Ukko May 25 '10 at 15:45
  • As the JMatch project (http://www.cs.cornell.edu/Projects/jmatch/) shows, it's not impossible to add powerful pattern matching to a completely non-functional language like java. However it is a fact that currently the only common languages that have pattern matching are (at least partly) functional in nature. – sepp2k May 25 '10 at 15:46
  • 4
    @camccann: If the language standard guarantees tco (or at least guarantees that recursive functions of a certain form will never cause a stack overflow or a linear growth of memory consumption), I'd consider that a language feature. If the standard doesn't guarantee it, but the compiler does it anyway, it's a compiler feature. – sepp2k May 25 '10 at 15:49
  • @sepp2k: It's an odd case because it's an "optimization" that's all but required for a functional language compiler to produce useful output. Language specifications do sometimes dictate implementation details or performance bounds for various reasons, but I have a hard time calling such things features of a language itself. Naught but quibbling over terminology, though, so disregard me at will! – C. A. McCann May 25 '10 at 16:04
16

One important factor to consider is that a big part of any compiler project is when you can self-host the compiler and "eat your own dog food." For this reason when you look at languages like OCaml where they are designed for language research, they tend to have great features for compiler-type problems.

In my last compiler-esque job we used OCaml for exactly this reason while manipulating C code, it was just the best tool around for the task. If the folks at INRIA had built OCaml with different priorities it might not have been such a good fit.

That said, functional languages are the best tool for solving any problem, so it logically follows that they are the best tool for solving this particular problem. QED.

/me: crawls back to my Java tasks a little less joyfully...

jrouquie
  • 4,315
  • 4
  • 27
  • 43
Ukko
  • 2,236
  • 2
  • 21
  • 33
10

Basically, a compiler is a transformation from one set of code to another — from source to IR, from IR to optimized IR, from IR to assembly, etc. This is precisely the sort of thing functional languages are designed for — a pure function is just a transformation from one thing to another. Imperative functions don't have this quality. Although you can write this kind of code in an imperative language, functional languages are specialized for it.

Chuck
  • 234,037
  • 30
  • 302
  • 389
7

One possibility is that a compiler tends to have to deal very carefully with a whole host of corner cases. Getting the code right is often made easier by using design patterns that structure the implementation in a way that parallels the rules it implements. Often that ends up being a declarative (pattern matching, "where") rather than imperative (sequencing, "when") design and thus easier to implement in a declarative language (and most of them are functional).

BCS
  • 75,627
  • 68
  • 187
  • 294
6

See also

F# design pattern

FP groups things 'by operation', whereas OO groups things 'by type', and 'by operation' is more natural for a compiler/interpreter.

Community
  • 1
  • 1
Brian
  • 117,631
  • 17
  • 236
  • 300
  • 3
    This relates to what is called, in some Programming Language Theory circles, the "expression problem". For example, see [this question](http://stackoverflow.com/questions/2807629/), wherein I demonstrate some truly horrible Haskell code that does things the "extensible types" way. Contrariwise, forcing an OOP language into the "extensible operations" style tends to motivate the Visitor Pattern. – C. A. McCann May 25 '10 at 19:41
4

Seems like everyone missed another important reason. It's quite easy to write a embedded domain specific language (EDSL) for parsers which look a lot like (E)BNF in normal code. Parser combinators like Parsec are quite easy to write in functional languages using higher-order functions and function composition. Not only easier but very elegantly.

Basically you represent the most simplest generic parsers as just functions and you have special operations (typically higher-order functions) which let you compose these primitive parsers into more complicated, more specific parsers for your grammar.

This is not the only way to do parer frameworks of-course.

snk_kid
  • 3,457
  • 3
  • 23
  • 18