Efficiently merge two lists

Question

I'm currently learning Haskell, and working with List.

According to the HaskellWiki, if I wanted to merge two lists together, I would write:

list1 ++ list2

However, according to this answer, using ++ on a large list would be inefficient.

From researching, I came across this SO page, but that question requires a specific requirement for the output of the List.

What I tried:

Suppose if I had two lists of numbers (for this example, assume that both lists are large enough that using ++ would be inefficient as described in the SO answer):

 oldNumbers = [1,5,14,22,37]
 newNumbers = [3,10,17,27,34,69]
 allNumbers = oldNumbers:newNumbers

As you can see I attempted to add oldNumbers to the head of newNumbers with the intention of reversing it afterward (disregard that allNumbers would be unorder for now, that's an exercise for another day).

As you probably guessed, it produced an error

error:
    * Non type-variable argument in the constraint: Num [a]
      (Use FlexibleContexts to permit this)
    * When checking the inferred type
        allNumbers :: forall a. (Num a, Num [a]) => [[a]]

So, as stated in the title, how would I efficiently merge two lists?

`(++)` is not inefficient in the sense that it works in linear time. The idea is that `(++)` however is sometimes "misused" to append a single element to a list turning the algorithm in *O(n^2)* instead of *O(n)*. — Willem Van Onsem, Nov 17 '18 at 15:42
correctness first, efficiency later. really. -- so are you merging two sorted lists, and the merged list must be sorted as well? if not, there's absolutely nothing wrong with one-time `++`. the answer you cite talks about *repeated* appending of *one-element* lists with `++ [x]` being "bad". — Will Ness, Nov 17 '18 at 15:47
@WillNess - The merged list doesn't have to be sorted. I'm taking small steps, so for now the mission is to merge the two lists. I'll `sort` them later when I understand `List` in Haskell better. — , Nov 17 '18 at 15:55
then `++` does the job. do you want to re-implement it, as an exercise? is your question about that ,or about the error you got? — Will Ness, Nov 17 '18 at 15:55
@WillNess - No, not right now. I was just wondering about the answer I cited, it said using `++` was inefficient, so it made me wonder what I sould do instead, the answer also stated to add it to the head of the list and reverse it, which I attempted, but received an error. But as you pointed out, `++` is fine here. — , Nov 17 '18 at 15:58
@melpomene - I probably misinterpreted what you stated. I way I read it was using `++` to merge one and only one element was fine, but when the lists are huge using `++` would be inefficient because of the copying. — , Nov 17 '18 at 16:06

Willem Van Onsem · Accepted Answer · 2018-11-17T23:31:42.910

According to the HaskellWiki, if I wanted to merge two lists together, I would write:
list1 ++ list2 
However, according to this answer, using ++ on a large list would be inefficient.

I think you need to take the context into account in this answer. If we append two lists a and b with (++) then it appends the two lists in O(m) (with m the length of the list). So this is, in terms of time complexity efficient. One can not construct such singly linked list more efficient than that.

@melpomene actually points to a popular mistake people new to Haskell make: they append a single element at the end of the list. Again if you only want to append a single element to the list, that is not a problem. This is a problem if you want to append n elements that way to a list, so if you each time append a single element to the list, and you do that n times, then the algorithm will be O(n²).

So in short, from a time complexity point of view, (++) is not slow when you append two lists together, nor is it slow when you append a singleton list to it. There are however, in terms of asymptotic behavior, usually better ways than repeatedly appending a list with one element to a list, since then the first operand will typically become larger, and each iteration it takes O(n) time only to append a single element to a list. Typically one can use recursion on the tail element of a list in that case, or for example build the list in reverse.

(++) is thus not "intrinsically" inefficient, by by using it in ways for which it was not design, you can obtain inefficient behavior.

An example where ++ will be quite slow is the following function that performs a "mapping":

badmap :: (a -> b) -> [a] -> [b]
badmap f = go []
    where go temp [] = temp
          go temp (x:xs) = go (temp ++ [f x]) xs

There are two problems here:

it is not lazy, it requires to evaluate the entire list before this can emit the first element; and
it will run in quadratic time, since for each element in the input list, it will take more and more time to append this element.

A more effective way to implement a map is:

goodmap :: (a -> b) -> [a] -> [b]
goodmap f = go
    where go (x:xs) = f x : go xs
          go [] = []

where does it say in the Report that lists are "singly linked"? I couldn't find anything like that there. The complexity of `++` seems not to be specified either. — Will Ness, Nov 17 '18 at 16:25
@WillNess: conceptually it is a sinly linked list, since there is no direct link in the data definition to the "parent". Furthermore that would also be strange, since one can reuse the tail, so there is no "previous". The time complexity follows from the implementation in the Haskell report on the standard prelude: https://www.haskell.org/onlinereport/standard-prelude.html — Willem Van Onsem, Nov 17 '18 at 16:30
conceptually it could just as well be implemented on arrays with O(1) append. the implementation gives value semantics for the types / functions, not operational semantics / performance. — Will Ness, Nov 17 '18 at 16:33
@WillNess: indeed, and then it is still *O(n)*, *O(n!)*,... Somehow this discussions feels a bit like the one I had with some friends about the first chapter of "*Computational Complexity*": "Why the computational model does not matter". — Willem Van Onsem, Nov 17 '18 at 16:34
I really like this answer, but could you provide example code of *append n elements that way to a list*? It will help understand better of what **not** to do. — , Nov 17 '18 at 23:21

score 4 · Answer 2 · answered Nov 17 '18 at 16:01

4

There's absolutely nothing wrong with appending two lists with one-time ++, however long they are.

The answer you cite, (rightfully) talks about repeated appending of one-element lists with ++ [x] being "bad".

answered Nov 17 '18 at 16:01

Will Ness

70,110
9
98
181

score 0 · Answer 3 · answered Nov 23 '18 at 14:49

0

If you would like to efficiently append arbitrary lists, use a different data structure! Data.Seq is optimized for efficient appends, for example.

https://www.stackage.org/haddock/lts-12.19/containers-0.5.11.0/Data-Sequence.html#v:-62--60-

answered Nov 23 '18 at 14:49

Dan Burton

53,238
27
117
198

Efficiently merge two lists

3 Answers3

Linked