4

I am studying a bit of functional programming using haskell and I am trying to go through some concepts by re-implementing some library functions.

I have a question though mainly in regards to when it is better to choose recursion over iteration. For example when reimplementing the "all" function I can choose between:

all1 :: (a -> Bool) -> [a] -> Bool
all1 p xs = foldr (\x acc -> acc && p x) True xs 

all2 :: (a -> Bool) -> [a] -> Bool
all2 p (x:xs) 
  | p x == False = False
  | otherwise    = all2 p xs

The way I look at it, the recursive one should be more time efficient since it will stop at the first non-matching entry, while the folding one is more space efficient (I think, still not clear about tail recursion optimisation in haskell) but it will always scan the full list, unless there is some clever optimisation done by looking at the fact that false will always give false once it's there.

So, is this compromise something which is always there? Am I misunderstanding something in the way recursion and folding work?

Stephane Rolland
  • 38,876
  • 35
  • 121
  • 169
Tallmaris
  • 7,605
  • 3
  • 28
  • 58
  • 3
    This area of Haskell (space/time efficiency) isn't my strong point, but I think that if you rewrite the folding function as `(\x acc -> p x && acc)` it then will not need to scan the full list if it finds a `False` value, since this variant turns `[e1, e2, e3, e4, ...]` into `p e1 && (p e2 && (p e3 && (p e4 && ...)))`, which short-circuits if anything returns `False`. – bradrn Mar 28 '19 at 00:10
  • 4
    (1) Side note: `foldr` is also recursive. A common term for talking about solutions like your second one, in contrast to ones using functions like `foldr`, is "explicit recursion". (2) As @bradrn notes, your first solution will short-circuit too if you switch the order of the arguments to `(&&)`. That would be well worth exploring in an answer here. – duplode Mar 28 '19 at 00:11
  • 1
    On another note, tail recursion in Haskell must be thought of accounting for laziness. Relevant Q&As: [*Does Haskell have tail-recursive optimization?*](https://stackoverflow.com/q/13042353/2751851); [*foldl is tail recursive, so how come foldr runs faster than foldl?*](https://stackoverflow.com/q/3429634/2751851). – duplode Mar 28 '19 at 01:54

2 Answers2

3

foldr is defined as follows:

foldr k z = go
          where
            go []     = z
            go (y:ys) = y `k` go ys

If you inline this definition into all1, you'll see that the result is recursive too. Therefore, it just isn't explicitly recursive, since it hides the recursion inside of foldr.

The foldr variant is both more space efficient and more time efficient, as foldr has rules for list fusion (an optimization to remove intermediate lists), which all1 gets for free.

To make short-circuiting work, just change acc && p x to p x && acc. With foldr, this will stop going over the list as soon as it gets a False result. With foldl or foldl', even if your folding function short-circuits, it needs to walk the rest of the list anyway.

Summary: Using foldr is more efficient than either foldl, foldl', or explicitly recursing in your own function. A nice easy test of this is to do +set :s in GHCi, and then compare performance on the list (False:replicate 10000000 True).

  • Aren't stream fusion rules technically a specific instance of rewrite rules, that have applied to the Vector library, but not in fact used in eg. foldr? – moonGoose Mar 28 '19 at 01:23
  • @ATayler They're used in foldr. See the `fold/build` rule in https://hackage.haskell.org/package/base-4.12.0.0/docs/src/GHC.Base.html – Joseph Sible-Reinstate Monica Mar 28 '19 at 01:34
  • That is a rewrite rule, not a stream fusion rule. Contrast to http://hackage.haskell.org/package/stream-fusion-0.1.2.5/ – moonGoose Mar 28 '19 at 01:42
  • 1
    @ATayler and JosephSible: I believe this is a matter of terminology. Joseph's answer refers to [what the User's Guide calls "list fusion"](https://downloads.haskell.org/~ghc/8.6.4/docs/html/users_guide/glasgow_exts.html#list-fusion). AFAIK, "Stream fusion" indeed refers to a different, if related, technique -- [this post](https://www.stackbuilders.com/tutorials/haskell/ghc-optimization-and-fusion/) looks like a good summary of the whole matter. – duplode Mar 28 '19 at 01:45
3

Let's consider, brick by brick, whether a foldr-based solution can short-circuit. To begin with, (&&) is defined like this:

(&&)                    :: Bool -> Bool -> Bool
True  && x              =  x
False && _              =  False

Given the second clause, and thanks to laziness, the second argument to (&&) is ignored if the first one is False -- in other words, it short-circuits.

Next, this is foldr for lists:

foldr            :: (a -> b -> b) -> b -> [a] -> b
foldr k z = go
          where
            go []     = z
            go (y:ys) = y `k` go ys

If y `k` go ys can be evaluated without looking at go ys, there will be no recursive call, and the fold as a whole will shortcut.

In all1, the binary operation is \x acc -> acc && p x. That isn't good enough for our purposes, for passing acc (which corresponds the go ys in the foldr definition) as the first, short-circuiting, argument of (&&) leads to the whole list being consumed regardless of what p x turns out to be. Not all is lost, though: swapping the arguments to (&&)...

all3 :: (a -> Bool) -> [a] -> Bool
all3 p xs = foldr (\x acc -> p x && acc) True xs

... gives us the desired short-circuiting.

duplode
  • 33,731
  • 7
  • 79
  • 150