This is a surprisingly complex question, because of two features of Haskell and GHC:
- Lazy evaluation
- List fusion
List fusion means that in some situations, GHC can rewrite list processing code into a loop that doesn't allocate list cells. So depending on the context where it is used, the same code could incur no additional cost.
Lazy evaluation means that if the result of an operation is not consumed, then you don't pay the cost of computing it. So for example, this is cheap, because you only have to construct the first ten elements of the list:
example = take 10 ([1..1000000] ++ [1000001])
In fact, in that code the take 10
can fuse with the list append, so it's the same as just [1..10]
.
But let's just assume that we're consuming all of the elements of all of the lists that we make, and that the compiler isn't fusing our list operations. Now to your questions:
If I add an element to a List in Haskell, Haskell returns a (completly?) new list, and doesn't manipulate the original one. Now let's say I have a List of a million elements and I append one element at the end. Does Haskell "copy" the whole list (1 million elements) and adds the element to that copy? Or is there a neat "trick" going on behind the scenes to avoid copying the whole list?
There exist tricks to avoid copying the whole list, but by appending to its end you defeat them. The thing to understand is that functional data structures are normally designed so that operations that "modify" them will exploit structure-sharing to reuse as much of the old structure as possible. So for example, appending two lists can be defined like this:
(++) :: [a] -> [a] -> [a]
[] ++ ys = ys
(x:xs) ++ ys = x : xs ++ ys
Looking at this definition, you can tell that the list ys
will be reused in the result. So if we have xs = [1..3]
, ys = [4..5]
and xs ++ ys
, all fully evaluated and retained in memory at once, it will look something like this memory-wise:
+---+---+ +---+---+ +---+---+
xs = | 1 | -----> | 2 | -----> | 3 | -----> []
+---+---+ +---+---+ +---+---+
+---+---+ +---+---+
ys = | 4 | -----> | 5 | -----> []
+---+---+ +---+---+
^
|
+------------------------------------+
|
+---+---+ +---+---+ +---+---+ |
xs ++ ys = | 1 | -----> | 2 | -----> | 3 | -------+
+---+---+ +---+---+ +---+---+
That is the long way of saying this: if you do xs ++ ys
, and it doesn't fuse, and you consume the whole list, then that will create a copy of xs
but reuse the memory for ys
.
But now let's look again at this bit of your question:
Now let's say I have a List of a million elements and I append one element at the end. Does Haskell "copy" the whole list (1 million elements) and adds the element to that copy?
That would be something like [1..1000000] ++ [1000001]
, and yes, it would copy the whole million elements. But on the other hand, [0] ++ [1..1000000]
would only copy the [0]
. The rule of thumb is this:
- Adding elements at the beginning of a list is most efficient.
- Adding elements at the end of a list is often inefficient, particularly if you do it over and over.
The general solutions to this sort of problem are:
- Modify your algorithm so that you use lists in an access pattern they support efficiently.
- Don't use lists; use some other sequence data structure that efficiently supports the access pattern you need for the problem at hand. Another answer mentioned difference lists, but others worth mentioning are: