2

Since a Monoid is closed (a -> a -> a), How can we get a second type 'b' along the way ? I have the impression foldr is too permissive, in the sense that I could use a function for folding that it's not closed. You'll notice as well that fold and foldMap only have 'a'.

Below a snippet of the Foldable typeclass :

class Foldable t where
    fold :: Monoid m => t m -> m
    foldMap :: Monoid m => (a -> m) -> t a -> m
    foldr :: (a -> b -> b) -> b -> t a -> b

e.g :

foldr (+) 0 [1..5] // ok (+) is a monoid
foldr (++) "" ["ab", " cd"] // ok (++) is a monoid for String
foldr (:) [] [1,2,3] // (:) :: a -> [a] -> [a] is not a monoid...

I was thinking a Foldable should/could only fold with a monoid, is that statement wrong ? (e.g : In my head it was like reduce is using a commutative monoid and fold a simple monoid.. (see Difference between reduce and foldLeft/fold in functional programming (particularly Scala and Scala APIs)?))

Peter Hall
  • 53,120
  • 14
  • 139
  • 204
Nicolas Henin
  • 3,244
  • 2
  • 21
  • 42
  • 3
    "Too permissive" in what sense? Do you think the more general type of `foldr` makes it unsafe somehow? – sepp2k Jun 15 '18 at 14:25
  • yeah like we could use a binary operation that is Open for example – Nicolas Henin Jun 15 '18 at 14:28
  • fold may be a bit easier to understand than foldr or foldl, but that doesn't make it better. E.g. most if if not all of the list functions that return a list can be written in terms of foldr, but are rather annoying to write in terms of fold (without introducing tons of appends) – Cubic Jun 15 '18 at 14:35
  • The `Monoid` constraint is on the elements of the `Foldable`, not the accumulating function. `fold` and `foldMap` can obtain an accumulating function for the items in the foldable from the `Monoid` instance (i.e. `mappend`). `foldr` is more general in that the accumulator type can be different from the element type. `(+)` and `(++)` are not monoids. – Lee Jun 15 '18 at 15:13
  • - (+) is the summation for Num which is equivalent to the Sum data-type. So I disagree it is a monoid. – Nicolas Henin Jun 15 '18 at 15:19
  • - (++) is the concatenation for String, it's the default Monoid for the String data-type. So I disagree it is a monoid. – Nicolas Henin Jun 15 '18 at 15:20
  • Regarding foldr vs FoldMap, you are supposed to get foldr if you define foldMap (and vice-versa) as {-# MINIMAL foldMap | foldr #-} is written in the Foldable typeclass, so there are linked somehow. foldMap f = foldr (mappend . f) mempty – Nicolas Henin Jun 15 '18 at 15:21
  • The answer is probably in that line of code : foldr f z t = appEndo (foldMap (Endo #. f) t) z, but I don't understand these concepts of Endo and appEndo yet... – Nicolas Henin Jun 15 '18 at 15:24
  • @NicolasHenin (1) While e.g. `Integer` forms a monoids with `(+)` and `0`, when it comes to the nuts and bolts of actual code you can't use it directly as a `Monoid`, as there is no instance -- that is what the `Sum` newtype is there for. (2) You might find [this answer of mine](https://stackoverflow.com/a/23319967/2751851), which covers similar ground to chi's answer here, helpful. – duplode Jun 15 '18 at 15:39
  • You can read `(#.)` as `(.)`, but with the special restriction that the first argument doesn't do anything except wrap and unwrap newtype constructors. If it does anything else, the behavior will likely be very surprising, because `(#.)` only looks at the type of the first argument and ignores its value altogether. `(#.) _ = coerce`. – dfeuer Jun 15 '18 at 16:23

2 Answers2

6

I'll only consider lists as a concrete case.

One way to understand why b is unconstrained is to consider the monoid:

newtype Endo b = Endo { appEndo :: b -> b }

instance Monoid (Endo b) where
   mempty = Endo id
   mappend (Endo f) (Endo g) = Endo (f . g)

Note that this is the monoid formed by functions of the form b -> b, where the monoid operation is composition, and the neutral element is the identity function.

Crucially, this is a monoid no matter what b is!

Then, up to isomorphism, we can write

foldr :: (a -> Endo b) -> b -> [a] -> b
foldr e n list = appEndo (mconcat (map e list)) n

so that most of the work is done in the Endo monoid.

Reordering the arguments, we even get

foldr :: (a -> Endo b) -> [a] -> b -> b

or even, up to isomorphism,

foldr :: (a -> Endo b) -> [a] -> Endo b
foldr e = mconcat . map e 

(which is the usual implementation in terms of foldMap.)

So, another justification for b being unrestricted in foldr is that Endo b does not require any condition on b to be a monoid.


More low-level explanation through some examples:

As an example, note that foldr (:) [] list = list for any list :: [a].

To obtain the above, we need to pick b ~ [a]. If we were limited to choose b ~ a we could not type check the above example.

As another example, consider

any :: (a -> Bool) -> [a] -> Bool
any p = foldr (\x acc -> p x || acc) True

Above, x :: a and acc :: b ~ Bool, which are in general different types.

Also, don't forget the definition of foldr on lists:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr c n [] = n
foldr c n (x:xs) = c x (foldr c n xs)

Here, there is no restriction that is needed on b to make foldr well-typed.

chi
  • 111,837
  • 3
  • 133
  • 218
  • 'you want a monoid at all costs' @chi, I just want to understand the concepts behind, instead of just saying to myself that because type inference is working it's ok... foldr (:) [] [1,2,3] // (:) :: a -> [a] -> [a] is not a monoid... in the same typeclass you have foldMap which is more restrictive than foldr. And they are supposed to be equivalent because {-# MINIMAL foldMap | foldr #-}, so there are some mysteries... I don't understand that Endo stuff btw so maybe the explanation is in it. – Nicolas Henin Jun 15 '18 at 15:17
  • 1
    @NicolasHenin I now realized that line sounded aggressive -- sorry for that :) Anyway, you are correct when you way that `a -> [a] -> [a]` is not a monoid, but we can write that (up to isomorphism) as `a -> Endo [a]`, whose target `Endo [a]` is a monoid. I expanded a bit the definition of `Endo` above with a bit of comment. Is there any specific part about `Endo` where you'd want more detail? – chi Jun 15 '18 at 15:33
3

Looking at the Foldable class definition you quoted, the type of foldr just with its arguments re-ordered a bit,

    foldr' ::   Foldable t =>     (a -> (b -> b)) -> t a -> (b -> b)   , 

actually unifies with (becomes the same as) the type of

    foldMap :: (Foldable t, Monoid m) => (a -> m) -> t a -> m          , 

provided b -> b is a monoid, (b -> b) ~ Monoid m => m i.e. Monoid (b -> b), the type of so-called endofunctions (i.e. functions from a type going to the same type), which are associatively combined with functional composition just as you expected.

Indeed they do form a monoid.

What does that mean, endofunctions forming a monoid under functional composition ((f . g) x = f (g x)) ? Simply, that any two endofunctions can be composed, with functional composition being associative ( (f . g) . h == f . (g . h) -- you can check this out yourself by using its definition). Also it means the existence of the special function id such that id . f == f == f . id; indeed id x = x fits the bill. That's all, believe it or not.

Indeed (:) :: a -> [a] -> [a] (read: (:) has type a -> [a] -> [a]), which is a kind of a -> b -> b; so, with one :: Int ; one = 1 we have (one :) :: [Int] -> [Int] which is a kind of b -> b as well.

Also (one +) :: Int -> Int, which is an even more specialized kind of b -> b, but still.

As a technicality, a newtype must be used to actually give this type its Monoid instance. This basically means, you can treat Endo / appEndo as syntactic noise, when reading the code.

So foldr' is indeed foldMap, up to some newtype tagging (with Endo) and untagging (with appEndo) : appEndo (Endo f) == f, that's all:

Sum 1     <> Sum 2     = Sum (1 + 2)         -- Sum is a tag, saying "we want (+)"
Endo (1:) <> Endo (2:) = Endo ((1:) . (2:))  -- Endo is a tag, saying "we want (.)"

foldMap Sum          [1,2,3] = Sum (1 + 2 + 3)
foldMap (Endo . (:)) [1,2,3] = Endo ((1:) . (2:) . (3:))

foldr' (:) [1,2,3] [4,5] = appEndo (foldMap (Endo . (:)) [1,2,3]) [4,5]
                         = appEndo (Endo ((1:) . (2:) . (3:))) [4,5]
                         = ((1:) . (2:) . (3:)) [4,5]
                         = [1,2,3,4,5] =
foldr (:) [4,5] [1,2,3]

Notice the lack of inner parentheses in (1 + 2 + 3) as well as in ((1:) . (2:) . (3:)). The associativity of <> (here, +, and .) means the actual parenthesization doesn't matter semantically, and can only matter operationally. Which is the whole point, because grouping the calculations as a tree opens them up for the possibility of being calculated in parallel:

(1 + 2 + 3 + 4 + 5 +  + ...)
=
( ( (1 + 2) + (3 + 4) ) + ( ( (5 + 6) + ... ) ... ) ... )
=
 ......

Of course the actual order with foldr / Endo is linear:

foldr (:) [] [1,2,3,4]
=
1 : (2 : (3 : (4 : [] )))

(copying here some content from the comments, as we are supposed to do, on SO.)

  • one trick I find helps me in understanding the tangled newtyped codes: when you see something like Endo f <> Endo g = Endo $ x -> (f . g) x, write it instead as an equational pseudocode, like appEndo (Endo f <> Endo g) x = (f . g) x seen as the definition of <> – it's not a valid Haskell, but expresses the intent more clearly, for me. I.e. I tend to see the lambdas as implementation, and equations as, well, equations (an example) ("tangled" does not apply here, of course, but e.g., like, with Continuation monad, etc.)

You're asking,

"what #. does in (Endo #. f)?

  • well, the explanation is here. it says, in our case, that Endo #. (:) = coerce (:) `asTypeOf` (Endo . (:)). this is the same as writing let { g :: a -> Endo [a] ; g = coerce (:) } in g. Endo b is practically the same as b -> b, save for the tag Endo which lets us define the Monoid instance for it. instance Monoid (b->b) where .... isn't valid Haskell. So we can read Endo #. (:) as just Endo . (:).
  • this has to do with what newtype is. it can be seen as compile-time tag. it disappears at run-time. (like, Sum 2 is actually just 2 at run time, and Product 2 is just 2 as well). But writing Sum . (1+) hides its ephemeral nature, which apparently "can lead to very inefficient code". IOW this stuff is for compiler, it's not for the language itself. language-wise, #. is just .. # is usually used in such situations.
Will Ness
  • 70,110
  • 9
  • 98
  • 181