Function Overhead in Functional Languages like Haskell or in Hybrids like Scala

Question

Coming from imperative languages like Python, Javascript and Java, I was very often reading about function overhead and why to avoid map from a performance perspective. Obviously these are no functional languages and foreign concepts usually tend to be less optimised and less idiomatic. I understand that calling functions is pushing values from the registers back to the stack which is expensive.

So with the recent buzz about FP concept and languages I'm really wondering how does Haskell solve this Problem? Is it just that the compilers are inlining a whole lot? In addition to that how do FP-Languages (Clojure/ Scala) on the JVM solve this Problem? Not even having a decent Tail-Call optimisation tells quite a bit about JVMs capabilities in terms of optimising FP Code.

Thank you!

Could you point to some references about what performance penalties that functions, `map`, etc. pay in these imperative languages? — amindfv, Apr 24 '13 at 19:56
This question seems a bit broad--the answer is that the languages (especially Haskell) compile code in a completely different way from imperative languages. As an amusing illustration, Haskell neatly reverses this trend: functional code is fast and imperative code is often slow because the compiler doesn't know what to do with it. (I mostly base this observation off of a neat [blog post](http://augustss.blogspot.com/2007/08/programming-in-c-ummm-haskell-heres.html).) — Tikhon Jelvis, Apr 24 '13 at 20:00
pushing registers onto the stack, do you mean pushing frames onto the stack? Registers are a hardware concept. — The Internet, Apr 24 '13 at 20:01
Coming from Python and Javascript, there seems to be little valid criticism on the performance front. Compared to Java, Scala can suffer marginally but only rarely is it of a serious magnitude. — Randall Schulz, Apr 24 '13 at 20:10
@Dave I assume he means that values are pushed from registers onto the stack. — sepp2k, Apr 24 '13 at 20:12
You seem to talk about things you don't really understand: "Tail-Call Recursion optimisation". What the JVM is missing is *tail call optimisation*, the special case of *tail recursion* can be trivially handled by generating a while loop by any decent compiler. — Ingo, Apr 26 '13 at 13:51
http://en.wikipedia.org/wiki/Tail_call calls it "a special case of recursion." So how is the term recursion wrong in this context? And how about this? http://neopythonic.blogspot.ch/2009/04/tail-recursion-elimination.html. Please enlighten me. — AlessandroEmm, Apr 26 '13 at 17:07
@AlessandroMeyer a tail call is a call whose return value is immediately returned by the calling function, without further processing. Only when the function calls itself recursively in the tail position we talk about tail recursion. TRrecs are a subset of TCalls. To optimize tail calls, one can re-use the current position of the call stack, i.e. long cascades of TCs can run in constant stack space. But this makes e.g. debugging hard (stack traces would miss methods), and doesn't provide much value to non-functional programs. TRs can be optimized without much confusion into simple loops. — amon, Apr 26 '13 at 22:44
I see, thank you for the clarification! Corrected my question. — AlessandroEmm, Apr 26 '13 at 22:47
@AlessandroMeyer Guido Himselves writes there: *After all TRE only addresses recursion that can easily be replaced by a loop.* -- What he fails to notice is that mechanical transformations should be done by the machine. Because this is easily possible, tail call recursion is not a problem at all, but stack overflows due to non-recursive tail calls still are (in machines, including virtual ones, that don't support it properly) — Ingo, Apr 27 '13 at 18:32

score 5 · Accepted Answer · answered Apr 24 '13 at 20:52

5

I can't provide a comprehensive answer for Haskell, but for Scala the answer is pretty simple: it performs transformations on the bytecode so that, for example, a (simple) tail call is implemented as a loop with variables instead. This is inherently what any language has to do to achieve good performance since computers are mutable (RAM, not WORM!).

Turning a method into something that can be passed around involves creating an object, but object creation is cheap on the JVM, and both the JVM and Scala have or will have tricks to avoid creating that object at all unless it's really necessary.

Then there's the issue of re-use of memory vs. use of new memory. That one is hard to get around entirely, but the JVM is very good at rapidly reclaiming memory, so you pay a relatively modest penalty for e.g. recreating your list each time instead of mutating the values in it. (If no references to the old list remain, you could just mutate the values and call it a new list--I don't know whether GHC plays tricks like this.) The worst situation is when you need a local update, where you can have log(n) work instead of constant-time work in some cases.

When you add all of these things together, the (single-threaded) performance penalty is often modest or negligible.

answered Apr 24 '13 at 20:52

Rex Kerr

166,841
26
322
407

4

"If no references to the old list remain, you could just mutate the values and call it a new list--I don't know whether GHC plays tricks like this." Not in this sense I think, Haskell avoids the notion of mutability on pretty much every level; but GHC does make very good use of rewrite rules, in particular [_stream fusion_](http://stackoverflow.com/questions/578063/what-is-haskells-stream-fusion), which is, in a way, even better: rather than mutating a list because you notice it's not used anywhere else, you compile the program so this intermediate list is _never even actually built up_! – leftaroundabout Apr 24 '13 at 22:34
@leftaroundabout - That also sounds useful, but different. You might want to pass a list to a bunch of functions to compute some things with folds or whatever, and then change some values and do it again. – Rex Kerr Apr 24 '13 at 23:03
@RexKerr: you're correct that if you want to share the intermediate list fusion won't be of much use. Since Haskell lists are simply linked lists, the tail past the last mutation will be shared, and the rest will be re-created. This may or may not be helpful, depending on what's mutated. But as Haskell programs tend to do *a lot* of allocations, ghc's garbage collector is tuned quite differently than the JVM and is usually quite good at reclaiming memory in these situations. – John L Apr 25 '13 at 00:17
@JohnL - The JVM also expects _a lot_ of allocations and is also usually quite good at reclaiming memory. Have you actually measured a major difference between the two GCs? – Rex Kerr Apr 25 '13 at 10:49
@RexKerr I haven't measured the JVM myself. It came up in a discussion with Simon Marlow about trying to implement some Java garbage collection strategies in Haskell, and he said that since Haskell programs tend to allocate much more than Java they've had to make different tradeoffs WRT garbage collection. – John L Apr 26 '13 at 00:10
1

Immutability lets GHC's GC assume that all references only point "forward", which simplifies and speeds it up on immutable data. Allocations are also more on the order of `alloca` than `malloc`, since they just bump a pointer. – Mysterious Dan Sep 09 '13 at 14:35

score 0 · Answer 2 · answered Apr 24 '13 at 20:50

0

any JVM based language benefits from JIT compilation, so that already gives a huge advantage to languages like Scala and Clojure.

Furthermore, scala can benefit from @tailrec optimizations that will transform a recursive call into a loop.

answered Apr 24 '13 at 20:50

fracca

2,417
1
23
22

A "huge advantage" over which languages? [Python has JIT](http://pypy.org/), [modern JavaScript engines have JIT](http://stackoverflow.com/a/7807863/485115), Haskell compiles to machine code to begin with. – Nikita Volkov Apr 24 '13 at 22:37
3

@NikitaVolkov The most commonly used implementation of Python doesn't have a JIT. Common wisdom about "avoiding `map` for performance reasons" doesn't come from PyPy. – Ben Apr 25 '13 at 00:13
2

@NikitaVolkov over ANY language, including C++. JIT compilation optimizes on what the runtime actually is doing as opposed to what it was expected the program would do. Both Python and Javascript are dynamic, so many optimisations just dont work. – fracca Apr 25 '13 at 07:34
@fracca Statements like that need proving. All the benefits I know of are theoretical, while in practice it turns out that [Java is unquestionably beaten by C++](http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=all&lang=gcc&lang2=java&data=u64q) and [Scala is beaten by Haskell](http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=all&lang=ghc&lang2=scala&data=u64q). Oh, and btw, Clojure is dynamic too and is [unquestionably slower than Haskell](http://benchmarksgame.alioth.debian.org/u64q/benchmark.php?test=all&lang=ghc&lang2=clojure&data=u64q). – Nikita Volkov Apr 25 '13 at 10:20
@NikitaVolkov - That looks like a tie between Scala and Haskell to me, and it's only because no one has gotten around to updating a couple of the Scala programs for 2.10, leaving only slower variants working. – Rex Kerr Apr 25 '13 at 10:41
@RexKerr In terms of time - yes, but not a tie at all in terms of memory consumption. And being a both Haskell and Scala programmer I've seen enough proofs to this in my work. – Nikita Volkov Apr 25 '13 at 10:49
1

@NikitaVolkov - Oh, well, sure, anything on the JVM will eat up oodles of memory. It's just a design decision: instead of trying to be careful with memory, you decide in advance how much is okay, and then use all of that (and let the OS figure out what to page out); it's not tuned to perform well in a low memory usage regime. But the incremental cost per extra bit of data or whatever is not much worse than elsewhere, so if you're working with 4GB of image data, say, you hardly notice the extra. (You just notice that you had to ask for what you needed in advance.) – Rex Kerr Apr 25 '13 at 10:59
I could spent time discussing about it, but have other things to do. Instead, I'll refer to a good summary post: http://www.infoq.com/articles/9_Fallacies_Java_Performance – fracca Apr 25 '13 at 11:45
@fracca You could and you should have, since its a professional community and one better have facts to back the arguments up with. Instead you've chosen to act all arrogant. Well, ok, bye bye then. – Nikita Volkov Apr 25 '13 at 13:08

Function Overhead in Functional Languages like Haskell or in Hybrids like Scala

2 Answers2