12

Consider this function:

f as = if length as > 100 then length as else 100

Since the function is pure it's obvious that the length will be the same in both calls. My question is does Haskell optimizer turn the code above into equivalent of the following?

f as = 
  let l = length as
  in if l > 100 then l else 100

If it does, then which level setting enables it? If it doesn't, then why? In this scenario a memory waste can't be the reason as explained in this answer, because the introduced variable gets released as soon as the function execution is finished.


Please note that this is not a duplicate of this question because of the local scope, and thus it may get a radically different answer.

Community
  • 1
  • 1
Nikita Volkov
  • 42,792
  • 11
  • 94
  • 169

3 Answers3

16

GHC now does some CSE by default, as the -fcse flag is on.

On by default.. Enables the common-sub-expression elimination optimisation. Switching this off can be useful if you have some unsafePerformIO expressions that you don't want commoned-up.

However, it is conservative, due to the problems with introducing sharing (and thus space leaks). The CSE pass is getting a bit better though (and this).

Finally, note there is a plugin for full CSE.

If you have code that could benefit from that.

Mikhail Glushenkov
  • 14,928
  • 3
  • 52
  • 65
Don Stewart
  • 137,316
  • 36
  • 365
  • 468
13

Even in such a local setting, it is still the case that it is not obvious that the introduction of sharing is always an optimization. Consider this example definition

f = if length [1 .. 1000000] > 0 then head [1 .. 1000000] else 0

vs. this one

f = let xs = [1 .. 1000000] in if length xs > 0 then head xs else 0

and you'll find that in this case, the first behaves much better, as each of the computations performed on the list is cheap, whereas the second version will cause the list to be unfolded completely in memory by length, and it can only be discarded after head has been reduced.

kosmikus
  • 19,549
  • 3
  • 51
  • 66
  • 3
    Despite this problem, ghc could be much more aggressive with CSE. You just have to have a size estimate of the value you are CSEing. A simple estimate is that base types take up negligable space. – augustss Feb 26 '13 at 09:14
  • How is `length [1 .. 1000000] > 0` a cheap operation? Won't "length" have to return before ">" is evaluated? (In ghci, the operation is slowed down noticably when I increase the size of the list) – salty-horse Sep 12 '16 at 11:45
  • 1
    @salty-horse Cheap in terms of space: constant space, linear time. – kosmikus Sep 12 '16 at 13:16
5

The case you are describing has more to do with common subexpression elimination than memoization, however it seems that GHC currently doesn't do that either because unintended sharing might lead to space leaks.

shang
  • 24,642
  • 3
  • 58
  • 86