Why the "++" function is much more expensive than ":"?

Question

The following paragraph comes from Learn You a Haskell for Great Good!

"Watch out when repeatedly using the ++ operator on long strings. When you put together two lists (even if you append a singleton list to a list, for instance: [1,2,3] ++ [4]), internally, Haskell has to walk through the whole list on the left side of ++. That's not a problem when dealing with lists that aren't too big. But putting something at the end of a list that's fifty million entries long is going to take a while. However, putting something at the beginning of a list using the : operator (also called the cons operator) is instantaneous."

I don't know why Haskell has to walk through the whole list on the left side of ++.

Do you know lists are implemented in Haskell? How else could `(++)` work? — Carl, Sep 13 '18 at 13:37
@Carl I know [1,2,3] is actually just syntactic sugar for 1:2:3:[] — rostej, Sep 13 '18 at 13:42
@GovindParmar This question is explicitly about performance (while the other one isn't), and the answers here address reflect that by addressing that aspect in a more effective manner. I would leave it open, unless there was a better match with great answers lying around. — duplode, Sep 13 '18 at 13:56

score 4 · Answer 1 · edited Sep 13 '18 at 15:17

The list on the rhs (right-hand side) must come after the last element on the lhs. Since haskell lists are implemented in terms of sucessors, you need to "reach" the last element to append anything to it, i.e. make the list you're appending its successor.

This is analogous to concatenating singly-linked lists in an imperative language, if you only store a reference to the first element. You can only append to the last one, but to find it, you need to walk all the links.

If you implement your own list, this becomes even more obvious because of the syntax change:

data List a = Empty | Cons a (List a)

Cons 1 (Cons 2 (Cons 3 Empty)))

To append to such a list, you need to change^* the middle Empty. But looking at the value "from the outside" (e.g. by pattern matching), you only see the Cons 1 <tail>. The tail part is some cloudy thing until you evaluate it and see Cons 2 <tail> and so on, which is what you've tried to avoid.

Converserly, prepending would simply be Cons 0 <the list above>, wrapping the entire list without even looking at it. This is why you can write things like 0 : [1..], but can't write things like [1..] ++ [42].

^* Create a new list which differs in that specific point. Haskell lists (values in general) are obviously immutable.

score 2 · Answer 2 · answered Sep 13 '18 at 13:55

I think maybe a manual unrolling would be a helpful tool in addition to the other answers. A list [1,2,3] is represented as a right-nested expression of "conses" (:) like this

1 : (2 : (3 : []))

We understand that the : operator associates to the right so this is just written

1:2:3:[]

Append is defined by these two patterns:

[]     ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)

So now let's just look at how [1,2,3] ++ [4,5] is unfolded. Starting by rewriting into conses:

(1:2:3:[]) ++ (4:5:[])
--                    (second pattern, x = 1, xs = 2:3:[], ys = 4:5:[])
1 : ((2:3:[]) ++ (4:5:[]))
--                    (second pattern, x = 2, xs = 3:[], ys = 4:5:[])
1 : 2 : ((3:[]) ++ (4:5:[]))
--                    (second pattern, x = 3, xs = [], ys = 4:5:[])
1 : 2 : 3 : ([] ++ (4:5:[]))
--                    (first pattern, ys = 4:5:[])
1 : 2 : 3 : 4 : 5 : []

So that's how it works. See how we traversed the left list, but we didn't need to traverse the right one? In some sense, we traverse the left list in order to find the [] at the end of it, and then replace the [] at the end with a reference to the whole right list.

score 1 · Answer 3 · edited Sep 13 '18 at 16:26

1

A list like [3,1,5,6] in Haskell looks like this:

3 : (1 : (5 : [6])), where : is the cons function. Clearly you can still write lists like [3,1,5,6] in Haskell, but that is just fancy syntax.

Adding an element after the 6 is difficult as you can see: the 6 is nested all the way down the list. In order to add the item, Haskell needs to deconstruct the list entirely before it can add the item like so:

(++) (x : xs) otherList = x : (xs ++ otherList)
(++) [x] otherList = x : otherList

This is very likely not the actual definition of (++), but it shows the problem. The (++) operator has to traverse through the entire leftmost list in order to cons the last item in the list to the new list, after which recursively all other items are again concatenated.

edited Sep 13 '18 at 16:26

Will Ness

70,110
9
98
181

answered Sep 13 '18 at 13:46

Glubus

2,819
1
12
26

`:` is closer to a list literal than a function that returns a list. Whereas the return value of `++` leaves no trace of what its original arguments were, the value *created* by `:` stores its arguments. `1 : ()` *is* the list, rather than an expression that evaluates to a list. – chepner Sep 13 '18 at 14:00
Yes, that is my understanding as well, why do you feel the need to comment this? Or are you expanding on what I said? I cannot find something in my answer that contradicts this. `:` is a constructor of `List`, which makes it a function. – Glubus Sep 13 '18 at 14:05

Why the "++" function is much more expensive than ":"?

3 Answers3