State hack: why is eta-expanding State# lambdas important for performance?

Question

GHC -O can cause severe pessimisation of some programs because of "state hack". Why the "state hack" (i.e. eta-expanding State# lambdas) is so important for performance of others?

References:

Typically if you write a function like `foo1 = \x -> let expensive = .. in \y -> ..` then you get a performance gain over `foo2 = \x y -> let expensive = .. in ..`. However, if `y :: RealWorld#` then it carries no information (i.e. isn't useful) so the `\y` isn't a "real" lambda (in that pushing a value for `y` on the stack is always useless; it can't possibly be used anyways). The operational semantics of `foo1` is as follows: enter the `\x`; create a thunk for `expensive`; enter the `\y`. For `foo2`: enter the `\x y` *simultaneously*; create a thunk for `expensive`. — user2407038, Jan 15 '18 at 01:15
Basically the state hack saves an extra function call in every `IO` action which is seperated by a `>>=` if that `IO` action is actually constructed by applying a function to a value. For example, the state hack gives no performance gains for `f0 >> f1 >> f2` but for `f0 >>= \x0 -> f1 x0 >>= \x1 -> f2 x0 x1` it may save entering into two function (which has a trivial cost in general, so wouldn't be much of an optimization; but when you *know* the argument can't possibly be used and you have very many such function calls, it adds up!) — user2407038, Jan 15 '18 at 01:18
I encourage posting an answer! With my current understanding, the "state-hack" exists because GHC is not able to distinguish "oneShot" lambdas that are always faster after optimization (one thunk, not two) from other lambdas that always suffer because they can't get hold of a thunk introduced by "let" keyword and thus must repeat this computation. — sevo, Sep 22 '18 at 14:38

State hack: why is eta-expanding State# lambdas important for performance?

0 Answers0