Efficient implementation of immutable (double) LinkedList

Question

Having read this question Immutable or not immutable? and reading answers to my previous questions on immutability, I am still a bit puzzled about efficient implementation of simple LinkedList that is immutable. In terms of array tha seems to be easy - copy the array and return new structure based on that copy.

Supposedly we have a general class of Node:

class Node{
    private Object value;
    private Node next;
}

And class LinkedList based on the above allowing the user to add, remove etc. Now, how would we ensure immutability? Should we recursively copy all the references to the list when we insert an element?

I am also curious about answers in Immutable or not immutable? that mention cerain optimization leading to log(n) time and space with a help of a binary tree. Also, I read somewhere that adding an elem to the front is 0(1) as well. This puzzles me greatly, as if we don't provide the copy of the references, then in reality we are modifying the same data structures in two different sources, which breaks immutability...

Would any of your answers alo work on doubly-linked lists? I look forward to any replies/pointers to any other questions/solution. Thanks in advance for your help.

Eric Lippert · Accepted Answer · 2012-04-06T16:06:49.333

Supposedly we have a general class of Node and class LinkedList based on the above allowing the user to add, remove etc. Now, how would we ensure immutability?

You ensure immutability by making every field of the object readonly, and ensuring that every object referred to by one of those readonly fields is also an immutable object. If the fields are all readonly and only refer to other immutable data, then clearly the object will be immutable!

Should we recursively copy all the references to the list when we insert an element?

You could. The distinction you are getting at here is the difference between immutable and persistent. An immutable data structure cannot be changed. A persistent data structure takes advantage of the fact that a data structure is immutable in order to re-use its parts.

A persistent immutable linked list is particularly easy:

abstract class ImmutableList
{
    public static readonly ImmutableList Empty = new EmptyList();
    private ImmutableList() {}
    public abstract int Head { get; }
    public abstract ImmutableList Tail { get; }
    public abstract bool IsEmpty { get; }
    public abstract ImmutableList Add(int head);
    private sealed class EmptyList : ImmutableList
    {
        public override int Head { get {  throw new Exception(); } }
        public override ImmutableList Tail { get { throw new Exception(); } }
        public override bool IsEmpty { get { return true; } }
        public override ImmutableList Add(int head)
        {
            return new List(head, this);
        }
    }

    private sealed class List : ImmutableList
    {
        private readonly int head;
        private readonly ImmutableList tail;
        public override int Head { get { return head; } }
        public override ImmutableList Tail { get { return tail; } }
        public override bool IsEmpty { get { return false; } }
        public override ImmutableList Add(int head)
        {
            return new List(head, this);
        }
    }
}
...
ImmutableList list1 = ImmutableList.Empty;
ImmutableList list2 = list1.Add(100);
ImmutableList list3 = list2.Add(400);

And there you go. Of course you would want to add better exception handling and more methods, like IEnumerable<int> methods. But there is a persistent immutable list. Every time you make a new list, you re-use the contents of an existing immutable list; list3 re-uses the contents of list2, which it can do safely because list2 is never going to change.

Would any of your answers also work on doubly-linked lists?

You can of course easily make a doubly-linked list that does a full copy of the entire data structure every time, but that would be dumb; you might as well just use an array and copy the entire array.

Making a persistent doubly-linked list is quite difficult but there are ways to do it. What I would do is approach the problem from the other direction. Rather than saying "can I make a persistent doubly-linked list?" ask yourself "what are the properties of a doubly-linked list that I find attractive?" List those properties and then see if you can come up with a persistent data structure that has those properties.

For example, if the property you like is that doubly-linked lists can be cheaply extended from either end, cheaply broken in half into two lists, and two lists can be cheaply concatenated together, then the persistent structure you want is an immutable catenable deque, not a doubly-linked list. I give an example of a immutable non-catenable deque here:

http://blogs.msdn.com/b/ericlippert/archive/2008/02/12/immutability-in-c-part-eleven-a-working-double-ended-queue.aspx

Extending it to be a catenable deque is left as an exercise; the paper I link to on finger trees is a good one to read.

UPDATE:

according to the above we need to copy prefix up to the insertion point. By logic of immutability, if w delete anything from the prefix, we get a new list as well as in the suffix... Why to copy only prefix then, and not suffix?

Well consider an example. What if we have the list (10, 20, 30, 40), and we want to insert 25 at position 2? So we want (10, 20, 25, 30, 40).

What parts can we reuse? The tails we have in hand are (20, 30, 40), (30, 40) and (40). Clearly we can re-use (30, 40).

Drawing a diagram might help. We have:

10 ----> 20 ----> 30 -----> 40 -----> Empty

and we want

10 ----> 20 ----> 25 -----> 30 -----> 40 -----> Empty

so let's make

| 10 ----> 20 --------------> 30 -----> 40 -----> Empty
|                        /
| 10 ----> 20 ----> 25 -/

We can re-use (30, 40) because that part is in common to both lists.

UPDATE:

Would it be possible to provide the code for random insertion and deletion as well?

Here's a recursive solution:

ImmutableList InsertAt(int value, int position)
{
    if (position < 0) 
        throw new Exception();
    else if (position == 0) 
         return this.Add(value);
    else 
        return tail.InsertAt(value, position - 1).Add(head);
}

Do you see why this works?

Now as an exercise, write a recursive DeleteAt.

Now, as an exercise, write a non-recursive InsertAt and DeleteAt. Remember, you have an immutable linked list at your disposal, so you can use one in your iterative solution!

Thank you very much for this. BTW, according to the above we need to copy prefix up to the insertion point. By logic of immutability, if w delete anything from the prefix, we get a new list as well as in the suffix... Why to copy only prefix then, and not suffix? — Bober02, Apr 06 '12 at 15:40
Would it be possible to provide the code for random insertion and deletion as well? Btw. are there any papers/questions about similar data structures such as queues, maps, trees etc. — Bober02, Apr 06 '12 at 15:58
Thanks again Eric, really appreciate your time. Hope you will have anothe minute to spare on these two simple questions: you say "persistent doubly linked list" is difficult. What do you mean by persistent? I also commented on your other question where you wrote that insertion can be reduced down to log(n) using blanaced trees. Is it because we don't have to copy left/right part of the tree if we insert into right/left subtree? Thanks again — Bober02, Apr 06 '12 at 16:05
@Bober02: I've added a recursive solution. My advice is to study it, and then write a non-recursive solution that does the same thing. You should consider reading my blog series on immutable data structures; one you have the basics down, pick up Okasaki's book; it is excellent but very deep. — Eric Lippert, Apr 06 '12 at 16:08
@Bober02: I said in my answer what I mean by "persistent". A persistent data structure is one where you can re-use most of it. Regarding binary tree insertion: yes, but a better way to characterize why insertion into a balanced tree is log n is because insertion into a tree *rewrites the spine of the tree*. The spine is of length log n in a height balanced tree. — Eric Lippert, Apr 06 '12 at 16:09
Wow, really great help. Final question, promise: when you say persistent means you can reuse most of it, does that not describe also mutable data structures? I always though persistent data structures were a subset of immutable data structures but I might be wrong — Bober02, Apr 06 '12 at 16:49
@Bober02: Correct; one typically thinks of persistent data structures as a subset of immutable data structures. You can't *safely* re-use parts of a mutable data structure because of course they can mutate. — Eric Lippert, Apr 06 '12 at 17:09
Hey @EricLippert, sorry for digging up this old answer. I'm currently looking into immutability in C# and was wondering why `System.Collections.Immutable.ImmutableList` hasn't been implemented as a linked list (like it's done in your example and the functional languages I know). Instead it's a binary tree which has slower performance when adding. Do you know anything about the trade-offs involved? My guess is that it could throw C# developers off that they would have to prepend instead of append for maximum performance. — Good Night Nerd Pride, Oct 11 '19 at 17:07
@GoodNightNerdPride: That sounds like a good question to post on Stack Overflow, but your guess is good. Think about all the operations you want to perform on lists: prepend, append, random access to the middle, concatenate, and so on; stacks are O(1) for *one* of those operations and O(n) for the rest. But balanced binary trees can be O(lg n) for all those operations if you're clever. — Eric Lippert, Oct 11 '19 at 17:43

score 0 · Answer 2 · answered Apr 06 '12 at 15:24

0

Should we recursively copy all the references to the list when we insert an element?

You should recursively copy the prefix of the list up until the insertion point, yes.

That means that insertion into an immutable linked list is O(n). (As is inserting (not overwriting) an element in array).

For this reason insertion is usually frowned upon (along with appending and concatenation).

The usual operation on immutable linked lists is "cons", i.e. appending an element at the start, which is O(1).

You can see clearly the complexity in e.g. a Haskell implementation. Given a linked list defined as a recursive type:

data List a = Empty | Node a (List a)

we can define "cons" (inserting an element at the front) directly as:

cons a xs = Node a xs

Clearly an O(1) operation. While insertion must be defined recursively -- by finding the insertion point. Breaking the list into a prefix (copied), and sharing that with the new node and a reference to the (immutable) tail.

The important thing to remember about linked lists is :

linear access

For immutable lists this means:

copying the prefix of a list
sharing the tail.

If you are frequently inserting new elements, a log-based structure , such as a tree, is preferred.

answered Apr 06 '12 at 15:24

Don Stewart

137,316
36
365
468

Thanks for your answer. You mention "You should recursively copy the prefix of the list up until the insertion point" So, if I have a list 1->2->3 and I insert zero at the front, according to your statement I don't need to copy anything. Now, if I delete 2, then I modified two separate copies, as they both have EXACTLY same tail nodes... I guess I am missing sth here... I just cannot see why we need to copy prefix but not suffix, because if we delete an element in the suffix, both lists would be affected – Bober02 Apr 06 '12 at 15:31
1

@Bober02: You are still thinking like data structures are mutable. If you delete an element in the suffix then *you have an entirely new list*. You have not mutated some object that other objects depend on. – Eric Lippert Apr 06 '12 at 15:35
OK, then why bother of copying the prefix? By this logic, if w delete anything from the prefix, we get a new list as well... Why to copy it then? – Bober02 Apr 06 '12 at 15:40
You have to copy the prefix, otherwise insertion is not pure, but has the side effect of modifying the list for all lists that point to the prefix. – Don Stewart Apr 06 '12 at 15:50
Sorry,but still don't get it. What do you mean by "pure"? And again, why is prefix different from the suffix? THanks again for your comments BTW, since insertion is O(n) in space and time, and as I mentioned you can apparently use trees to reduce it to Long(n), would you be able to provide some comments on that as well? – Bober02 Apr 06 '12 at 15:56
"pure" is a property of code where the output of an operation only depends on its inputs, and not the state of the system. Immutable data structures are pure. Mutable structures are impure: the result of an operation depends on the behavior of both the inputs to the function, and what other functions have done to the data type in the meantime. – Don Stewart Apr 06 '12 at 16:06

jhegedus · Answer 3 · 2017-01-03T08:39:03.610

There is a way to emulate "mutation" : using immutable maps.

For a linked list of Strings (in Scala style pseudocode):

case class ListItem(s:String, id:UUID, nextID: UUID)

then the ListItems can be stored in a map where the key is UUID: type MyList = Map[UUID, ListItem]

If I want to insert a new ListItem into val list : MyList :

def insertAfter(l:MyList, e:ListItem)={ 
     val beforeE=l.getElementBefore(e)
     val afterE=l.getElementAfter(e)
     val eToInsert=e.copy(nextID=afterE.nextID)
     val beforeE_new=beforeE.copy(nextID=e.nextID)
     val l_tmp=l.update(beforeE.id,beforeE_new)
     return l_tmp.add(eToInsert)
}

Where add, update, get takes constant time using Map: http://docs.scala-lang.org/overviews/collections/performance-characteristics

Implementing double linked list goes similarly.

Efficient implementation of immutable (double) LinkedList

3 Answers3

Linked