3

I decided to get serious with Haskell, and am wrapping my head around the use of foldl and foldr. They really feel a lot like Clojure's reduce - but I might be wrong, and ran into a problem soon enough, which I hope someone will easily explain.

Working with this doc: https://wiki.haskell.org/Foldr_Foldl_Foldl'

Before getting too deep into implementing my own versions of foldr/foldl, I decided to first test the existing ones from Prelude:

± |master U:2 ✗| → ghci
GHCi, version 8.6.3: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /Users/akarpov/.ghc/ghci.conf
Prelude> foldr (+) 0 [1..9999999]
49999995000000
Prelude> foldr (+) 0 [1..99999999]
*** Exception: stack overflow

Didn't see that coming (same result when using foldl); I rather expected something along the lines of Clojure:

> (time (reduce +' (range 1 99999999)))
"Elapsed time: 3435.638258 msecs"
4999999850000001

The only obvious (and irrelevant) difference is using +' rather than +, but it's only to accommodate the JVM's type system - the number produced doesn't fit into a [default] Long, and +' will automatically use a BigInteger when needed. Most importantly, no stack overflow. It seems to indicate, therefore, that folding/reduction in Haskell/Clojure is either implemented very differently, or my use of haskell's implementation is wrong.

in case it is relevant, these are global-project settings: - packages: [] - resolver: lts-13.8

Will Ness
  • 70,110
  • 9
  • 98
  • 181
alexakarpov
  • 1,848
  • 2
  • 21
  • 39
  • 2
    Use `Data.List.foldl'`, the strict form of `foldl`. – chepner Feb 25 '19 at 01:29
  • 1
    Haskell equivalent to Clojure's `(reduce +' (range 1 99999999))` is `foldl' (+) 0 [1..99999998]` (sic) (apparently; to get the same result as you show). Haskell ranges are inclusive. – Will Ness Feb 25 '19 at 13:26
  • 1
    see also [Does Haskell have tail-recursive optimization?](https://stackoverflow.com/q/13042353/849891). the question is worded differently, but it is actually near-duplicate for this one -- `foldl` is tail-recursive, etc. – Will Ness Feb 25 '19 at 22:36

1 Answers1

-2

Problem

As the Wiki explains, the function (+) is strict in both its arguments, which means that when you try to do 1 + (2 + 3), you first need to calculate (2 + 3). While this may not seem like a big problem at first, it becomes when you have a long list. Quoting the wiki,

to evaluate: 1 + (2 + (3 + (4 + (...)))), 1 is pushed on the stack.

Then: 2 + (3 + (4 + (...))) is evaluated. So 2 is pushed on the stack.

Then: 3 + (4 + (...)) is evaluated. So 3 is pushed on the stack.

Then: 4 + (...) is evaluated. So 4 is pushed on the stack.

Then: your limited stack will eventually run full when you evaluate a large enough chain of (+)s. This then triggers the stack overflow exception.

I don't much about Clojure, but if (+') works, then it definitely does not need both its arguments to be evaluated before being reduced, and that is also the solution in Haskell.

Solution

Foldl would not solve the problem, as it is known that foldl has to travel the whole list twice before returning a result --which is not efficient--, and even then (+) is still strict, so the reducible expressions are not reduced.

To solve this, you must use a non-strict function. In the standard Prelude, seq :: a -> b -> b can be used exactly for that, and that's how foldl' works.

Again quoting the wiki,

foldr is not only the right fold, it is also most commonly the right fold to use, in particular when transforming lists (or other foldables) into lists with related elements in the same order. Notably, foldr will be effective for transforming even infinite lists into other infinite lists. For such purposes, it should be your first and most natural choice. For example, note that foldr (:) []==id.

The problem with foldl' is that it reverses the list. If you have a commutative function that is not a problem, so if your list is finite (remember foldl has to travel it all), foldl' is usually better. On the other hand, if your function for some reasons does not necessarily need the whole list, or the list might be infinite, go for foldr

Lorenzo
  • 2,160
  • 12
  • 29
  • 6
    While I agree with your eventual solution—namely, use `foldl'`—I think a lot of things in your surrounding explanation aren’t quite right. The problem is not just that `+` is strict in both its arguments, but that `foldr` is *lazy*; the fact that you say `+'` in Clojure must not be strict in both its arguments doesn’t make any sense because Clojure is not a lazy programming language. The reason it is a problem in Haskell is because of the creation of thunks, but this does not happen in Clojure because its `reduce` is essentially Haskell’s `foldl'`; that is, it is strict in its accumulator. – Alexis King Feb 25 '19 at 01:49
  • 6
    Furthermore, you write that “`foldl` has to travel the whole list twice before returning a result”, which is not true. `foldl` does have to travel the whole list before returning a result, but it only has to do it once. This is in contrast with `foldr`, which doesn’t have to travel the whole list even once before returning a result if the accumulation function is lazy in its second argument. However, both of these things are immaterial in a case where the accumulation function is strict, since the whole list will have to be traversed, anyway. – Alexis King Feb 25 '19 at 01:52
  • thank you (both). Looks like my mistake was jumping to conclusions, before finishing the whole page, eh. Specifically, implenebting our own versions of foldl and foldr sounded like a good exercise - but, I thought, surely Prelude's versions will be already perfect and not run into overflows... which, I see, is due to (+), and not fold's implementation. – alexakarpov Feb 25 '19 at 02:11
  • @AlexisKing That was my understanding, which is probably wrong if you say so. I am happy to make it a wiki answer if you wanna contribute or you can just write your own answer if thats easier – Lorenzo Feb 25 '19 at 02:15
  • @alexakarpov It's a decent assumption (that the prelude versions would be good). I am a bit bewlidered by `foldl` –– I've been using Haskell for over ten years and have never encountered a situation where `foldl` was useful. It's either `foldr` or `foldl'`. – luqui Feb 25 '19 at 03:37
  • 2
    @luqui, `foldl` is essentially never useful for lists. OTOH, it's just right for folding a `Set`, `Seq`, etc., "backwards". For example, `foldl (flip (:)) []` will produce a descending list of `Set` elements or convert a sequence to a reversed list. The same applies to `foldr'`: it's worthless for lists but sometimes useful for other structures. – dfeuer Feb 25 '19 at 08:19
  • 2
    @dfeuer, oh the `Foldable` one. I still haven't updated my map of those functions to `Foldable` -- I guess it's been a while now. But yeah, it seems the problem is that the fold associativity is incompatible with list's associativity; `foldl` would be perfect for `data Tsil a = Lin | Snoc [a] a`. Cool, I hadn't realized that! – luqui Feb 25 '19 at 08:41
  • 2
    @AlexisKing `foldl` traverses the list "twice" in the sense that it traverses it once to build the thunk result, and then *that thunk* is "traversed" too, to be evaluated, unfolding the stack, which is then folded back (which could be seen as the *third* "traversal"). that's my theory anyway. – Will Ness Feb 25 '19 at 14:28
  • @WillNess Good point. That does seem like the most reasonable interpretation, though I still wouldn’t personally consider it quite the same as traversing the list twice! – Alexis King Feb 25 '19 at 20:09