2

Consider list concatenation implemented as a fold left, that is, foldl (++) []. What is the complexity of this implementation in a lazy-evaluated language, such as Haskell? I understand that in a strict language the performance is quadratic in the total number of elements, but what happens when laziness is involved?

I've tried to evaluate by hand the expression ([1,2,3] ++ [4,5,6]) ++ [7,8,9] (which corresponds to foldl (++) [] [[1,2,3], [4,5,6], [7,8,9]]) and it seems that we traverse only once each element, but I'm not sure whether my reasoning is correct:

([1,2,3] ++ [4,5,6]) ++ [7,8,9]
= { rewrite expression in prefix notation }
(++) ((++) [1,2,3] [4,5,6]) [7,8,9]
= { the first (++) operation needs to pattern match on its first argument; so it evaluates the first argument, which pattern matches on [1,2,3] }
(++) (case [1,2,3] of {[] -> [4,5,6]; x:xs' -> x:(++) xs' [4,5,6]}) [7,8,9]
= { x = 1, xs' = [2,3] }
(++) (1:(++) [2,3] [4,5,6]) [7,8,9]
= { the first (++) operation can now pattern match on its first argument }
1:([2,3] ++ [4,5,6]) ++ [7,8,9]

I have assumed the following implementation of (++):

(++) :: [a] -> [a] -> [a]
xs ++ ys = case xs of [] -> ys
                      (x:xs') -> x : (xs' ++ ys)
Dan Oneață
  • 968
  • 7
  • 14
  • Perhaps your question is answered by [How is foldl lazy?](https://stackoverflow.com/questions/7487455/how-is-foldl-lazy) Specifically, `foldl` has an impact on performance because of laziness. – sshine May 14 '18 at 09:37
  • Compare also to https://softwareengineering.stackexchange.com/q/294658/11732: if you don't insist on forming the list first and scanning it after it, you only pay the concatenation price once. – 9000 May 14 '18 at 21:18

1 Answers1

8

Let's say we have ([1,2,3]++[4,5,6])++[7,8,9]

([1,2,3]++[4,5,6])++[7,8,9]
(1:([2,3]++[4,5,6))++[7,8,9]
1:(([2,3]++[4,5,6])++[7,8,9])
1:((2:([3]++[4,5,6])++[7,8,9])
1:2:(([3]++[4,5,6])++[7,8,9])
1:2:(3:([]++[4,5,6])++[7,8,9])
1:2:3:(([]++[4,5,6])++[7,8,9])
1:2:3:([4,5,6]++[7,8,9])
1:2:3:4:([5,6] ++ [7,8,9])
1:2:3:4:5:([6] ++ [7,8,9])
1:2:3:4:5:6:([] ++ [7,8,9])
1:2:3:4:5:6:[7,8,9]
[1,2,3,4,5,6,7,8,9]

Notice how every element in the first list had to be moved twice? That's because it was two from the end. In general if we have (((a1++a2)++a3)++..an) each element in the list ai will have to be moved n-i times.

So, if you want the whole list, it's quadratic. If you want the first element, and you have n lists, it's n-1* operations (we need to do the step-case of ++ n times). If you want the ith element, it's the number of operations for all the elements before it, plus k-1, where it was in the kth list, counting from the end.

*Plus the n operations from foldl itself, if we want to be pedantic

Probie
  • 1,371
  • 7
  • 15
  • Thanks! That's spot on. Meanwhile, I've found [this answer](https://stackoverflow.com/a/13879693/474311), whose first half also responds to my question. – Dan Oneață May 14 '18 at 16:11