Time complexity of multiple string concatenation (join) in functional programming languages

Question

Am I right that the only algorithm that can be implemented in functional programming languages like Haskell to concatenate multiple strings (i.e. implement join that transforms list of lines ["one", "two", "three"] into one line "onetwothree") has time complexity of order O(n^2), like described in this well-known post?

E.g. if I work with immutable strings, for example, in Python, and try to implement join, I'll get something like

def myjoin(list_of_strings):
   return list_of_strings[0] + myjoin(list_of_strings[1:])

Is it true that it is not possible to make it faster, for example, in Haskell?

https://stackoverflow.com/questions/34008010/is-this-time-complexity-actually-on2 — user6144226, Sep 07 '17 at 10:03
Concatenating two strings takes `O(0)` time in Haskell. (Until you actually iterate the list). — Bergi, Sep 07 '17 at 10:15
Where in that linked blog post is anything said about functional programming languages like Haskell, and that they used this silly algorithm? No, `concat` is implemented with linear complexity, and using another type than lists it gets even better. — Bergi, Sep 07 '17 at 10:19
I'm asking about joining a list of strings (i.e. `["one", "two", "three"] → "onetwothree"`). What's the time complexity of this operation? — Ilya V. Schurov, Sep 07 '17 at 10:21
@IlyaV.Schurov in Haskell, a string *is* a list; ie `"hello" :: [Char]` — Mulan, Sep 08 '17 at 01:35

score 2 · Accepted Answer · answered Sep 07 '17 at 10:35

First of all Haskell is lazily: this means that if you write:

concat ["foo", "bar", "qux"]

it will not perform this operation until you request for instance the first character of the outcome. In that case usually it will not concatenate all strings together, but - depending on how smart the function is implemented - aim to do the minimal amount of work necessary to obtain the first character. If you request the first character, but do not inspect it, it could be possible that you for instance got succ 'f' instead of 'g' since again Haskell is lazy.

But let's assume that we are interested in the resulting string, and want to know every character. We can implement concat as:

concat :: [[a]] -> [a]
concat [] = []
concat (x:xs) = x ++ concat xs

and (++) as:

(++) :: [a] -> [a] -> [a]
(++) []     ys = ys
(++) (x:xs) ys = x : (xs ++ ys)

Now that means that - given (:) works in O(1) - (++) works in O(a) with a the length of the first list, and b (note that this is not in the big oh expression) the length of the second list.

So now if we inspect concat, we see that if we enter k strings, we will perform k (++) operations. At every (++) operation, the left string is equal to the length of the string. So that means that if the sum of the lengths of the strings is n, concat is an O(n) algorithm.

Thanks! Why `(:)` is *O(1)*? If we have a long string (of length *n*) and want to prepend it with a character, isn't it true that we have to create a new string of length *n+1* from scratch (provided that strings are immutable as anything else in functional programming), so it's *O(n)*? — Ilya V. Schurov, Sep 07 '17 at 10:47
@IlyaV.Schurov: No, in Haskell a list is a *linked* list. `(:)` is a constructor: it a construct with two references: one to the head and one to the tail. — Willem Van Onsem, Sep 07 '17 at 11:18
Ah, okay, it was my main mistake, I think mostly about Python strings and lists. So if lists are linked lists, concatenation is cheap indeed. But element retrieval can be expensive... — Ilya V. Schurov, Sep 07 '17 at 11:22
@IlyaV.Schurov: random element retrieval is indeed expensive. That's why most Haskell algorithms aim to exploit enumeration over random access. There are furthermore datastructures like arrays, that allow random access. — Willem Van Onsem, Sep 07 '17 at 11:23

Time complexity of multiple string concatenation (join) in functional programming languages

1 Answers1