10

The dlist package contains the DList data type, which has lots of instances, but not Foldable or Traversable. In my mind, these are two of the most "list-like" type classes. Is there a performance reason that DList is not an instance of these classes?

Also, the package does implement foldr and unfoldr, but none of the other folding functions.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
Mike Izbicki
  • 6,286
  • 1
  • 23
  • 53

2 Answers2

23

One alternative you should consider instead of DList is to use Church-encoded lists. The idea is that you represent a list as an opaque value that knows how to execute a foldr over a list. This requires using the RankNTypes extension:

{-# LANGUAGE RankNTypes #-}

import Prelude 
import Control.Applicative
import Data.Foldable (Foldable)
import qualified Data.Foldable as F
import Data.Traversable (Traversable)
import qualified Data.Traversable as T

-- | Laws:
--
-- > runList xs cons nil == xs
-- > runList (fromList xs) f z == foldr f z xs
-- > foldr f z (toList xs) == runList xs f z
newtype ChurchList a = 
    ChurchList { runList :: forall r. (a -> r -> r) -> r -> r }

-- | Make a 'ChurchList' out of a regular list.
fromList :: [a] -> ChurchList a
fromList xs = ChurchList $ \k z -> foldr k z xs

-- | Turn a 'ChurchList' into a regular list.
toList :: ChurchList a -> [a]
toList xs = runList xs (:) []

-- | We can construct an empty 'ChurchList' without using a @[]@.
nil :: ChurchList a 
nil = ChurchList $ \_ z -> z

-- | The 'ChurchList' counterpart to '(:)'.  Unlike 'DList', whose
-- implementation uses the regular list type, 'ChurchList' doesn't
-- rely on it at all.
cons :: a -> ChurchList a -> ChurchList a
cons x xs = ChurchList $ \k z -> k x (runList xs k z)

-- | Append two 'ChurchList's.  This runs in O(1) time.  Note that
-- there is no need to materialize the lists as @[a]@.
append :: ChurchList a -> ChurchList a -> ChurchList a
append xs ys = ChurchList $ \k z -> runList xs k (runList ys k z)

-- | Map over a 'ChurchList'.  No need to materialize the list.
instance Functor ChurchList where
    fmap f xs = ChurchList $ \k z -> runList xs (\x xs' -> k (f x) xs') z

-- | The 'Foldable' instance is trivial, given the 'ChurchList' law.
instance Foldable ChurchList where
    foldr f z xs = runList xs f z

instance Traversable ChurchList where
    traverse f xs = runList xs step (pure nil)
        where step x rest = cons <$> f x <*> rest

The downside to this is that there is no efficient tail operation for a ChurchList—folding a ChurchList is cheap, but taking repeated tails is costly...

Luis Casillas
  • 29,802
  • 7
  • 49
  • 102
  • the `tail` of a `ChurchList` can be computed, lazily, in constant time. – is7s Mar 28 '13 at 07:45
  • 1
    Note that I said "taking repeated tails"; if you're just taking the tail once, the simple `churchTail = fromList . tail . toList` doesn't look too bad. But now consider what happens with `churchTail . churchTail`: you get a `ChurchList` backed by a `[]`-list that's constructed from a `ChurchList` backed by a `[]`-list. The heart of the problem is that a `ChurchList` and its `churchTail` don't share structure like a `[]`-list and its tail do. I don't believe that more sophisticated implementations of `churchTail` that don't use `toList`/`fromList` can avoid this either. – Luis Casillas Mar 28 '13 at 17:51
  • True, repeated `tails` are costly for other implementations as well. BTW I don't think that the `append` operation of a `ChurchList` is any better than that of a normal list, is it? – is7s Mar 28 '13 at 18:08
  • also, `singleton x = cons x nil = ChurchList $ \k z -> k x (runList nil k z) = ChurchList $ \k z -> k x z`. then, `snoc xs x = append xs $ singleton x = ChurchList $ \k z -> runList xs k (runList (singleton x) k z) = ChurchList $ \k z -> runList xs k (k x z)`. Also O(1). @is7s why? I think it is much more like the `append` for `DList` (i.e. function composition), not `[]`. Recently this was discussed here http://stackoverflow.com/a/14942678/849891 and first, here http://stackoverflow.com/a/13879693/849891. I think it applies here as well - everything is O(1) until the first `toList`. – Will Ness Mar 28 '13 at 19:08
  • repeated tails is bad because functions are opaque. but maybe re-write rules can be employed to convert the inefficient `churchTail . churchTail = fromList . tail . toList . fromList . tail . toList` into the efficient `fromList . tail . tail . toList` by collapsing `toList . fromList` to `id`? – Will Ness Mar 28 '13 at 19:34
  • @WillNess No, it's not. A bunch of `append`s on a `ChurchList` will have the same problem when left-associatively composed. – is7s Mar 28 '13 at 20:04
  • @is7s could you give me a sample misbehaving expression please, to chew on? – Will Ness Mar 28 '13 at 20:08
  • @is7s https://gist.github.com/WillNess/5266509#file-churchlist-hs where am I wrong? – Will Ness Mar 28 '13 at 20:30
  • @WillNess for example the expression `(((a \`append\` b) \`append\` c) \`append\` d)` to compute the whole list, the list `a` will have to be traversed 3 times just as with normal lists. Difference lists were implemented mainly to solve this problem. – is7s Mar 29 '13 at 05:32
  • @is7s I disagree. I've updated the gist. the definition ``g = (((a `append` b) `append` c) `append` d)`` gets forced into WHNF on the first `runList g k z` and then rearranges itself into the right-nested `runList a k (runList b k (runList c k (runList d k z)))`, just like the func-compo chains of DList do. So `a` is run only once. Do you agree? (and if we don't name them, the unnamed thunks will get created instead, and forced). – Will Ness Mar 29 '13 at 07:33
  • @is7s thanks for the confirmation. :) This means, that this answer does provide a `Foldable` alternative to `DList`, without loosing any of its benefits. (!) – Will Ness Mar 29 '13 at 10:10
  • @WillNess has it been wrapped up in a package, with the rewrite rules? – Alp Mestanogullari Feb 27 '14 at 01:28
  • @AlpMestanogullari I've no idea. :) – Will Ness Feb 27 '14 at 08:22
12

DList a is a newtype wrapper around [a] -> [a], which has an a in a contravariant position, so it cannot implement Foldable or Traversable, or even Functor directly. The only way to implement them is to convert to and from regular lists (see the foldr implementation), which defeats the performance advantage of difference lists.

Sjoerd Visscher
  • 11,840
  • 2
  • 47
  • 59
  • 2
    Further to Sjoerd's answer, a DList is only efficient for **building** - if you have built your list and want to process it, you should covert it with `toList` then process the regular list. – stephen tetley Mar 23 '13 at 19:18
  • 4
    So why don't we simply define `fold (DL f) = fold (f [])`? We can forget about how `DList`s are implemented and simply view them as some representation of sequence of elements, and then implementing `Foldable` makes sense. Implementing `Functor` and `Traversable` in this way would probably have some pitfalls, but `Foldable` seems quite reasonable. – Petr Mar 23 '13 at 19:28
  • 2
    Yeah, `Foldable` might not be too bad, the package has `foldr` already and that's enough after all. I guess the reason it isn't implemented is because the last update was in 2009, when `Foldable` was not a well known type class yet. – Sjoerd Visscher Mar 23 '13 at 19:48