9

Recently I was introduced to this OCaml code which in Haskell can be written as:

data DL a = DL [a] a [a]

create [] = error "empty list"
create (x:xs) = DL [] x xs

next (DL pr x (h:tl)) = DL (x:pr) h tl
next _ = error "end of dlist"

prev (DL (p:pr) x tl) = DL pr p (x:tl)
prev _ = error "start of dlist"

which I though was not a proper doubly-linked list implementation, as it creates new storage on traversal. OTOH there's this Haskell code:

data DList a = Leaf | Node { prev::(DList a), elt::a, next::(DList a) }

create = go Leaf
  where go _    []     = Leaf
        go prev (x:xs) = current
            where current = Node prev x next
                  next    = go current xs

Can we say that it is only this code that's true dl-list?

Can we rely on this code to introduce true sharing of nodes of the dl-list, so that no new storage is created on traversal?

Is the same-named variable in Haskell always referring to the same "thing" or might separate occurrences of the same-named variable refer to separate copy of the same thing? (edited to add emphasis).

Community
  • 1
  • 1
Will Ness
  • 70,110
  • 9
  • 98
  • 181
  • 5
    The first implementation is what is known as a _Zipper_; arguably it can be for either singly- or doubly-linked lists. However, it is not a list implementation in its own right. – ivanm Feb 16 '12 at 11:06
  • 5
    If you read the Haskell report carefully, you can't find a single paragraph on how data is represented. Keep in mind, that all kind of sharing is implementation dependent, though there are mostly only a few sensible ways to implement a certain feature. – fuz Feb 16 '12 at 11:33
  • 1
    Thank you for this question! I haven't satisfied with my, watching for this. – demi Feb 16 '12 at 12:28
  • @FUZxxl I'll contend myself with anything firmly guaranteed by GHC. I don't think that without some specified behaviour sharing-wise we can reason in any way about efficiency. – Will Ness Feb 16 '12 at 12:40
  • If you say `let x = 5` and then refer to `x` multiple times it is very likely to refer to different copies of `5`. – augustss Feb 16 '12 at 14:51
  • @augustss that's ok, since `5` doesn't refer to anything further, unlike e.g. the `current` node, which the question really is about. – Will Ness Feb 16 '12 at 15:11
  • 1
    @WillNess GHC takes great care not to duplicate computations. Once an expression has been computed to WHNF it may be duplicated without duplicating work. – augustss Feb 16 '12 at 15:52
  • @augustss thanks for that, but what about not duplicating *space*? Does it take great care for that too? (btw there is such thing as [unwanted sharing](http://stackoverflow.com/questions/9114439/instance-declaration-in-haskell#comment11473273_9114439), e.g. in primes production via tree-folded multiples-lists, but that's besides the point here). – Will Ness Feb 16 '12 at 16:02
  • 1
    I'm not 100% sure GHC is safe for space, but under normal circumstances it will not duplicate data. – augustss Feb 16 '12 at 16:22
  • @augustss thanks a lot. So I take from this that it is indeed reasonable to write code pretending that variables are named pointers, like they are in Scheme for instance (for non-atomic data of course). – Will Ness Feb 16 '12 at 16:29
  • @WillNess Yes, it's a reasonable mental model. – augustss Feb 16 '12 at 16:35

2 Answers2

6

You can visualise how the memory layout of your data structure looks using a package called vacuum-cairo. Install from hackage with cabal install vacuum-cairo then you should be able to verify the difference in the two structures by something like this in GHCi:

> import System.Vacuum.Cairo
> view $ create [1..5]

There you can see the nodes are shared using the DList where as DL is two lists with an element in between (As pointed out, this is kind of Zipper).

Note: This is GHC specific, a different implementation might represent the data in memory differently, but this would be typical.

Oliver
  • 2,361
  • 2
  • 17
  • 12
  • thanks, I understand how it *supposed* to be. But it was suggested (in the post to which the first word in my question links) that sharing is not guaranteed. Then, how the structure is supposed to be, and how `cairo` would show it to me, will not necessarily be how it is in reality. I want to know, is there a guarantee at least that same named var is always pointing to the same thing in memory, in whatever implementation? – Will Ness Feb 16 '12 at 12:44
  • Yes, otherwise an example like this won't work: `(x, xs) = let y = replicate 10 x in (length y, y)` – Oliver Feb 16 '12 at 13:23
  • I think it still would work, since there's no cycles in the dependency flow in your example. – Will Ness Feb 16 '12 at 15:13
  • 1
    @WillNess as FUZxxl stated, the Haskell Report does not specify sharing behavior. GHC typically works as you expect, and when it doesn't, it's typically a very small deviation from what you would expect. Given the way Haskell is specified, you as the programmer should not be able to tell the difference between sharing implementations (value-wise), except perhaps the time it takes to perform computations. – Dan Burton Feb 16 '12 at 19:49
  • 1
    @DanBurton thanks! time *and* space ... But it makes me a bit uneasy. I don't like the word "typically" or "usually", I like "guaranteed". It makes me in control, instead of being at the mercy of. :) And if Haskell The Language goes to such lengths *not* to commit to anything, and pretend that it is equational specifications that we write and not programs, let it indeed let us get away with equational high-level specs, and *outdo* our expectations and perform all the transformations leading to the fastest and leanest code, all by itself. :) :) Then I'd gladly accept the lack of spec'ns. :) – Will Ness Feb 16 '12 at 20:52
  • @WillNess most languages can make guarantees about performance, while Haskell is a little more focused towards guarantees of correctness. Though, by not specifying sharing, the Haskell language makes a guarantee to the compiler that pointer equality cannot be inspected, and thus the compiler may make optimizations based on that. Most languages *do* specify that pointer equality matters, and so the compilers are constrained in what they can do when optimizing code dealing with pointers. The same goes for immutable data; Haskell and its compilers makes different tradeoffs than most languages do. – Dan Burton Feb 16 '12 at 21:47
3

I would suggest that the latter is the "correct" implementation, yes.

I do not have facts with which to back that up, but it would seem to me, given my understanding of the GHC implementation, that the latter should work how you'd expect a double-linked list to work.

MathematicalOrchid
  • 61,854
  • 19
  • 123
  • 220
  • thanks! I'd like to have certainty, especially about my last question. :) – Will Ness Feb 16 '12 at 11:01
  • 1
    @WillNess I believe that's an implementation issue, but GHC does use pointer-based representation under the hood to represent the same thing (e.g. in `xs++ys`, the original `ys` value is used as the end of the new combined list). – ivanm Feb 16 '12 at 11:07