Are functional languages inherently slow?

Question

Why are functional languages always tailing behind C in benchmarks? If you have a statically typed functional language, it seems to me it could be compiled to the same code as C, or to even more optimized code since more semantics are available to the compiler. Why does it seem like all functional languages are slower than C, and why do they always need garbage collection and excessive use of the heap?

Does anyone know of a functional language appropriate for embedded / real-time applications, where memory allocation is kept to a minimum and the produced machine code is lean and fast?

Will a multithreaded functional language app be faster or slower than a single threaded C app? I think that is the most important question. — tuinstoel, Feb 05 '09 at 15:22
@tuinstoel It depends of course (although you might be implying it will be faster). Parallelization has its own costs with launching threads and all that. Generally, there is a point where parallelization pays off. Before that point, a single thread is faster. After it, parallelization is faster. As an example of this, take a look at SlavaVendenin's answer here: https://stackoverflow.com/questions/309424/how-do-i-read-convert-an-inputstream-into-a-string-in-java. As you can see, Java's parallelized stream is much slower than a for loop with small enough cycles. — user904963, Dec 11 '21 at 18:13

J D · Answer 1 · 2016-02-12T01:05:28.963

Are functional languages inherently slow?

In some sense, yes. They require infrastructure that inevitably adds overheads over what can theoretically be attained using assembler by hand. In particular, first-class lexical closures only work well with garbage collection because they allow values to be carried out of scope.

Why are functional languages always tailing behind C in benchmarks?

Firstly, beware of selection bias. C acts as a lowest common denominator in benchmark suites, limiting what can be accomplished. If you have a benchmark comparing C with a functional language then it is almost certainly an extremely simple program. Arguably so simple that it is of little practical relevance today. It is not practically feasible to solve more complicated problems using C for a mere benchmark.

The most obvious example of this is parallelism. Today, we all have multicores. Even my phone is a multicore. Multicore parallelism is notoriously difficult in C but can be easy in functional languages (I like F#). Other examples include anything that benefits from persistent data structures, e.g. undo buffers are trivial with purely functional data structures but can be a huge amount of work in imperative languages like C.

Why does it seem like all functional languages are slower than C, and why do they always need garbage collection and excessive use of the heap?

Functional languages will seem slower because you'll only ever see benchmarks comparing code that is easy enough to write well in C and you'll never see benchmarks comparing meatier tasks where functional languages start to excel.

However, you've correctly identified what is probably the single biggest bottleneck in functional languages today: their excessive allocation rates. Nice work!

The reasons why functional languages allocate so heavily can be split into historical and inherent reasons.

Historically, Lisp implementations have been doing a lot of boxing for 50 years now. This characteristic spread to many other languages which use Lisp-like intermediate representations. Over the years, language implementers have continually resorted to boxing as a quick fix for complications in language implementation. In object oriented languages, the default has been to always heap allocate every object even when it can obviously be stack allocated. The burden of efficiency was then pushed onto the garbage collector and a huge amount of effort has been put into building garbage collectors that can attain performance close to that of stack allocation, typically by using a bump-allocating nursery generation. I think that a lot more effort should be put into researching functional language designs that minimize boxing and garbage collector designs that are optimized for different requirements.

Generational garbage collectors are great for languages that heap allocate a lot because they can be almost as fast as stack allocation. But they add substantial overheads elsewhere. Today's programs are increasingly using data structures like queues (e.g. for concurrent programming) and these give pathological behaviour for generational garbage collectors. If the items in the queue outlive the first generation then they all get marked, then they all get copied ("evacuated"), then all of the references to their old locations get updated and then they become eligible for collection. This is about 3× slower than it needs to be (e.g. compared to C). Mark region collectors like Beltway (2002) and Immix (2008) have the potential to solve this problem because the nursery is replaced with a region that can either be collected as if it were a nursery or, if it contains mostly reachable values, it can be replaced with another region and left to age until it contains mostly unreachable values.

Despite the pre-existence of C++, the creators of Java made the mistake of adopting type erasure for generics, leading to unnecessary boxing. For example, I benchmarked a simple hash table running 17× faster on .NET than the JVM partly because .NET did not make this mistake (it uses reified generics) and also because .NET has value types. I actually blame Lisp for making Java slow.

All modern functional language implementations continue to box excessively. JVM-based languages like Clojure and Scala have little choice because the VM they target cannot even express value types. OCaml sheds type information early in its compilation process and resorts to tagged integers and boxing at run-time to handle polymorphism. Consequently, OCaml will often box individual floating point numbers and always boxes tuples. For example, a triple of bytes in OCaml is represented by a pointer (with an implicit 1-bit tag embedded in it that gets checked repeatedly at run-time) to a heap-allocated block with a 64 bit header and 192 bit body containing three tagged 63-bit integers (where the 3 tags are, again, repeatedly examined at run time!). This is clearly insane.

Some work has been done on unboxing optimizations in functional languages but it never really gained traction. For example, the MLton compiler for Standard ML was a whole-program optimizing compiler that did sophisticated unboxing optimizations. Sadly, it was before its time and the "long" compilation times (probably under 1s on a modern machine!) deterred people from using it.

The only major platform to have broken this trend is .NET but, amazingly, it appears to have been an accident. Despite having a Dictionary implementation very heavily optimized for keys and values that are of value types (because they are unboxed) Microsoft employees like Eric Lippert continue to claim that the important thing about value types is their pass-by-value semantics and not the performance characteristics that stem from their unboxed internal representation. Eric seems to have been proven wrong: more .NET developers seem to care more about unboxing than pass-by-value. Indeed, most structs are immutable and, therefore, referentially transparent so there is no semantic difference between pass-by-value and pass-by-reference. Performance is visible and structs can offer massive performance improvements. The performance of structs even saved Stack Overflow and structs are used to avoid GC latency in commercial software like Rapid Addition's!

The other reason for heavy allocation by functional languages is inherent. Imperative data structures like hash tables use huge monolithic arrays internally. If these were persistent then the huge internal arrays would need to be copied every time an update was made. So purely functional data structures like balanced binary trees are fragmented into many little heap-allocated blocks in order to facilitate reuse from one version of the collection to the next.

Clojure uses a neat trick to alleviate this problem when collections like dictionaries are only written to during initialization and are then read from a lot. In this case, the initialization can use mutation to build the structure "behind the scenes". However, this does not help with incremental updates and the resulting collections are still substantially slower to read than their imperative equivalents. On the up-side, purely functional data structures offer persistence whereas imperative ones do not. However, few practical applications benefit from persistence in practice so this is often not advantageous. Hence the desire for impure functional languages where you can drop to imperative style effortlessly and reap the benefits.

Does anyone know of a functional language appropriate for embedded / real-time applications, where memory allocation is kept to a minimum and the produced machine code is lean and fast?

Take a look at Erlang and OCaml if you haven't already. Both are reasonable for memory constrained systems but neither generate particularly great machine code.

"All modern functional language implementations continue to box excessively" But why? Closures is one thing, more reasons? Where can I learn more about this? — Christian, Sep 01 '12 at 11:41
"If these were persistent then the huge internal arrays would need to be copied every time an update was made". Isn't the idea with persistent data structures that they needn't be copied, that updates can refer to the original structure instead? — Christian, Sep 01 '12 at 11:44
Why is that "few practical applications benefit from persistence in practice"? — Christian, Sep 01 '12 at 11:45
"But why? Closures is one thing, more reasons? Where can I learn more about this?". I believe they do it primarily because their predecessors did it. I cannot find any references on this. — J D, Sep 01 '12 at 17:54
"Isn't the idea with persistent data structures that they needn't be copied, that updates can refer to the original structure instead". Yes, the idea is that new versions can refer back to parts of old versions but that requires large arrays to be fragmented into many smaller data structures (parts) in order to make it possible to reference those parts. — J D, Sep 01 '12 at 18:22
"Why is that few practical applications benefit from persistence in practice". Practical applications rarely need to keep more than one version of a data structure around at a time. So the main benefit is clarity. — J D, Sep 01 '12 at 18:34
Great post! I would really be interested to read a post from you on the 2020 state of functional language compilers. Has OCaml improved regarding boxing? — fviktor, Sep 04 '20 at 23:43

Craig Stuntz · Answer 2 · 2016-02-12T04:12:37.573

17

Nothing is inherently anything. Here is an example where interpreted OCaml runs faster than equivalent C code, because the OCaml optimizer has different information available to it, due to differences in the language. Of course, it would be foolish to make a general claim that OCaml is categorically faster than C. The point is, it depends upon what you're doing, and how you do it.

That said, OCaml is an example of a (mostly) functional language which is actually designed for performance, in contrast to purity.

edited Feb 12 '16 at 04:12

answered Feb 05 '09 at 15:37

Craig Stuntz

125,891
12
252
273

Another benchmark showing OCaml as fast as C : http://www.timestretch.com/FractalBenchmark.html – Guillaume Feb 05 '09 at 15:41
3

lets be clear, the ocaml compiler (even opt) is a _native code compiler_, NOT an optimizer. – nlucaroni Mar 24 '09 at 13:21
I don't understand how that C code can be slower than OCaml bytecode interpreted by a C program because the latter must surely suffer from exactly the same aliasing problems?! – J D Feb 12 '16 at 01:20
@JonHarrop: This answer is 7 years old(!), but I did fix up the first link, which wasn't working anymore. – Craig Stuntz Feb 12 '16 at 04:13
@CraigStuntz, the age of an answer doesn't make it immune to criticism or comment. If the answer exists, right or wrong it's there influencing others. Further, if an answer "ages out", that needs to be commented on as well. – alife Feb 01 '22 at 15:00

score 15 · Answer 3 · answered Feb 05 '09 at 15:21

15

Functional languages require the elimination of mutable state that is visible at the level of the language abstraction. Therefore, data that would be mutated in place by an imperative language needs to be copied instead, with the mutation taking place on the copy. For a simple example, see a quick sort in Haskell vs. C.

Furthermore, garbage collection is required because free() is not a pure function, as it has side effects. Therefore, the only way to free memory that does not involve side effects at the level of the language abstraction is with garbage collection.

Of course, in principle, a sufficiently smart compiler could optimize out much of this copying. This is already done to some degree, but making the compiler sufficiently smart to understand the semantics of your code at that level is just plain hard.

answered Feb 05 '09 at 15:21

dsimcha

67,514
53
213
334

6

"Functional languages require the elimination of mutable state that is visible at the level of the language abstraction". Most functional languages are impure. – J D Jun 04 '12 at 23:51
and those that are pure only contain side effects, not remove them. Haskell for example does plenty of side effects, including exceptions, io, printing, threading, but these operations are boxed as a stream of operations rather or unsafePerformIO. The whole idea is be mindful of side effects and keep an eye on them. It is okay to have explicit allocations and deallocations within the io monad. Pure languages can easily launch missiles and still be pure, functional paradigm is very clever in how it defines purity. – Dmytro Nov 18 '16 at 22:53

score 9 · Answer 4 · answered Feb 05 '09 at 15:20

9

The short answer: because C is fast. As in, blazingly ridiculously crazy fast. A language simply doesn't have to be 'slow' to get its rear handed to it by C.

The reason why C is fast is that it was created by really great coders, and gcc has been optimized over the course of a couple more decades and by dozens more brilliant coders than 99% of languages out there.

In short, you're not going to beat C except for specialized tasks that require very specific functional programming constructs.

answered Feb 05 '09 at 15:20

Jens Roland

27,450
14
82
104

20

In my opinion C is fast because it doesn't do a lot (as a language). It makes you code all the details and doesn't do anything that you don't explicitly tell it to do. – Joachim Sauer Feb 05 '09 at 15:22
2

not sure why this was marked down. not exactly an academic answer but valid none the less. – Nick Van Brunt Feb 05 '09 at 15:59
It's telling that the phrase "High Level Assembly" was coined in reference to C. – Onorio Catenacci Jun 05 '12 at 17:22
-1 C is actually fast without optimizations as well. There are simple C compilers (e.g. TCC = tiny c compiler) without optimizations by "dozens more brilliant coders" than perform quite well. – peenut Jul 13 '13 at 14:40
C++ is often much faster than C. – user904963 Dec 11 '21 at 19:48

score 8 · Answer 5 · answered Feb 05 '09 at 15:27

8

The control flow of proceedural languages much better matches the actual processing patterns of modern computers.

C maps very closely onto the assembly code its compilation produces, hence the nickname "cross-platform assembly". Computer manufacturers have spent a few decades making assembly code run as fast as possible, so C inherits all of this raw speed.

In comparison, the no side-effects, inherent parallelism of functional languages does not map onto a single processor at all well. The arbitrary order in which functions can be invoked needs to be serialised down to the CPU bottleneck: without extremely clever compilation, you're going to be context switching all the time, none of the pre-fetching will work because you're constantly jumping all over the place, ... Basically, all the optimisation work that computer manufacturers have done for nice, predictable proceedural languages is pretty much useless.

However! With the move towards lots of less powerful cores (rather than one or two turbo-charged cores), functional languages should begin to close the gap, as they naturally scale horizontally.

answered Feb 05 '09 at 15:27

James Brady

27,032
8
51
59

+1 Why do the cores have to be less powerful? Why not simply more cores of equivalent or greater power than current cores? – AnthonyWJones Feb 05 '09 at 15:29
They don't *have* to be less powerful, it's just that as chip designers focus on getting more cores onto a die, and having cores and CPUs play well in a SMP situation, they focus less on raw speed. They can't focus on everything, unfortunately! :) – James Brady Feb 05 '09 at 17:56
-1 "The control flow of proceedural languages much better matches the actual processing patterns of modern computers". Compilers increasingly convert mutation into single static assignment. – J D Jun 04 '12 at 23:58
"C actually matches assembly better than functional languages out there" – Thomas Apr 03 '18 at 08:16
You should include the fact that a single thread is often faster for simpler, smaller, or (of course) non-parallelizable tasks. Frankly, many, many people are not doing anything near big and parallelizable enough to reap the benefits of parallelization. For example, check out the performance of Java's parallelized stream versus simple for loops here: https://stackoverflow.com/questions/309424/how-do-i-read-convert-an-inputstream-into-a-string-in-java. Someone did an interesting benchmark there. – user904963 Dec 11 '21 at 18:25

score 8 · Answer 6 · answered Feb 05 '09 at 15:29

8

C is fast because it's basically a set of macros for assembler :) There is no "behind the scene" when you are writing a program in C. You alloc memory when you decide it's time to do that and you free in the same fashion. This is a huge advantage when you are writing a real time application, where predictabily is important (more than anything else, actually).

Also, C compilers are generally extremly fast because language itself is simple. It even doesn't make any type checkings :) This also means that is easier to make hard to find errors. Ad advantage with the lack of type checking is that a function name can just be exported with its name for example and this makes C code easy to link with other language's code

answered Feb 05 '09 at 15:29

Emiliano

22,232
11
45
59

9

-1 "There is no "behind the scene" when you are writing a program in C". Calling conventions, registers and memory models are obvious counter examples. – J D Jun 04 '12 at 23:56
@Jon Harrop: Fair enough. I was thinking about memory management when I wrote that line (as the following sentence shows) but indeed my expression is misleading – Emiliano Jun 05 '12 at 08:14
Memory allocation in C is typically done via the `malloc` and `free` calls to the C runtime which are not idiomatic in assembler. – J D Jun 05 '12 at 10:15
I know, at least on linux a malloc might (or might not) result in a syscall (`sbrk()` if I remember correctly). Still, there is no third party 'runtime' when you write a program in C. No one else keeps track of your pointers, no one decides it's time to free the memory you allocated. You can write your smart pointer / garbage collection implementation if you like but there is no direct support in the language for those features. That is what I meant for "there is no behind the scene". It's a bit misleading and you did well noticing that point. – Emiliano Jun 05 '12 at 10:40
5

Yes, I think it is worth stressing that `malloc` and `free` are themselves doing a lot behind the scenes. I often hear people dismiss garbage collected languages in favour of C in the context of latency but they never seem to know the latency that `malloc` and `free` incur... – J D Jun 05 '12 at 16:21
4

@happy_emi: I've never perceived C compilers as "extremely fast". In fact, I usually find C compile times pathetic, and nothing short of abysmal for C++. A major reason for that is their archaic approach to separate compilation. – Andreas Rossberg Jun 05 '12 at 18:12

score 5 · Answer 7 · edited Jun 05 '12 at 17:23

5

Well Haskell is only 1.8 times slower than GCC's C++, which is faster than GCC's C implementation for typical benchmark tasks. That makes Haskell very fast, even faster than C#(Mono that is).

relative Language speed

1.0 C++ GNU g++
1.1 C GNU gcc
1.2 ATS
1.5 Java 6 -server
1.5 Clean
1.6 Pascal Free Pascal
1.6 Fortran Intel
1.8 Haskell GHC
2.0 C# Mono
2.1 Scala
2.2 Ada 2005 GNAT
2.4 Lisp SBCL
3.9 Lua LuaJIT

source

For the record I use Lua for Games on the iPhone, thus you could easily use Haskell or Lisp if you prefer, since they are faster.

edited Jun 05 '12 at 17:23

Onorio Catenacci

14,928
14
81
132

answered Feb 05 '09 at 15:53

Robert Gould

68,773
61
187
272

Ahead of Haskell in that list is another pure functional language - Clean. – igouy Mar 21 '09 at 17:15
Hadn't looked into Clean, but interesting to hear it's another functional language – Robert Gould Mar 22 '09 at 01:59
3

You forgot to mention PHP ;) Factor 25 or so ;) – ivan_ivanovich_ivanoff Apr 09 '09 at 15:13
2

-1 Much of that isn't "Haskell" code by any reasonable definition. – J D Jun 05 '12 at 00:00
ATS is also a functional language, which has IMO potential for safety-critical embedded applications, an area where C is often used. – Joh Jun 06 '12 at 11:34
In all cases, where I had the opportunity to compare haskel and .Net (F#), F# was faster than Haskell. The number of cases I actually compared would not even suffice for the thesis of Medical doctors, though ;) – BitTickler Jun 01 '16 at 19:14
1

Hi Robert, link is dead – Matas Vaitkevicius Dec 07 '18 at 09:13

score 4 · Answer 8 · answered Feb 05 '09 at 15:21

4

As for now, functional languages aren't used heavily for industry projects, so not enough serious work goes into optimizers. Also, optimizing imperative code for an imperative target is probably way easier.

Functional languages have one feat that will let them outdo imperative languages really soon now: trivial parallelization.

Trivial not in the sense that it is easy, but that it can be built into the language environment, without the developer needing to think about it.

The cost of robust multithreading in a thread-agnostic language like C is prohibitive for many projects.

answered Feb 05 '09 at 15:21

peterchen

40,917
20
104
186

+1 for mention that the parallelization is a lot easier (manual and automatic by the compiler) – Quonux May 14 '11 at 16:53
1

"functional languages aren't used heavily for industry projects". C# has first-class lexical closures. – J D Jun 04 '12 at 23:53
@Jon Harrop: Still I wouldn't go as far as call it a functional language, and from my gut statistics the amount of production code that would pass as functional is likely miniscule compared to the remaining code. – peterchen Jun 05 '12 at 08:01
1

"Functional languages have one feat that will let them outdo imperative languages really soon now". FWIW I don't think that's ever going to happen. Purity makes locality of reference unpredictable and multicore parallelism requires locality in order to scale. – J D Jun 06 '12 at 09:29
Oh, and purity can be extremely inefficient (see dictionaries, for example). – J D Jun 06 '12 at 09:29
1

@JonHarrop "Purity makes locality of reference unpredictable" Could you elaborate this a bit or provide a link? – User Mar 02 '14 at 00:15
1

@lxx: Sure, effective multicore parallelism requires efficient use of the machine's cache hierarchy. In an imperative style you use arrays and pay careful attention to explicit locality in order to achieve good cache complexity (see the Cilk papers http://www.fftw.org/~athena/papers/tocs08.pdf). In a purely functional style you have no idea where your values are in memory and cannot predict cache complexity so parallel scalability is unpredictiable: sometimes you get lucky but it is not reliable. – J D Mar 10 '14 at 23:08
1

"As for now, functional languages aren't used heavily for industry projects". this is not true anymore :) – Reza Jan 24 '19 at 22:41

score 3 · Answer 9 · answered Feb 05 '09 at 15:25

I disagree with tuinstoel. The important question is whether the functional language provides a faster development time and results in faster code when it is used to what functional languages were meant to be used. See the efficiency issues section on Wikipedia for a glimpse of what I mean.

score 1 · Answer 10 · answered Feb 05 '09 at 15:31

One more reason for bigger executable size could be lazy evaluation and non-strictness. The compiler can't figure out at compile-time when certain expressions get evaluated, so some runtime gets stuffed into the executable to handle this (to call upon the evaluation of the so-called thunks). As for performance, laziness can be both good and bad. On one hand it allows for additional potential optimization, on the other hand the code size can be larger and programmers are more likely to make bad decisions, e.g. see Haskell's foldl vs. foldr vs. foldl' vs. foldr'.

Are functional languages inherently slow?

10 Answers10