Nested ++
, as the ones you have in your in-order visit, can be inefficient. This is because list1 ++ list2
will copy (the spine of) list1
, as follows:
(1:2:3:[]) ++ list2
=
1: ((2:3:[]) ++ list2)
=
1:2: ((3:[]) ++ list2)
=
1:2:3: ([] ++ list2)
=
1:2:3:list2
Copying the first list might not be so bad if done once, but when we nest ++
as in
((list1 ++ list2) ++ list3) ++ list4
we copy the copy of the copy, which slows everything down, often making O(N^2) something that should be O(N).
When computing list1 ++ list2
, a key idea is this one: if we only could keep a "pointer" to the end []
inside list1
, we could avoid the copy, and simply rewrite it with a pointer to list2
, we would obtain constant-time (destructive) append.
Now, do we have imperative-style mutability in Haskell? Not for regular lists. We could, however, transform out lists into "functions of the end", i.e. instead of writing
1:2:3:[]
for a list, we could write
\end -> 1:2:3:end
to represent the same data. The latter representation of lists is called a "difference list". Converting from regular to difference lists is simple:
type DList a = [a] -> [a]
toDList :: [a] -> DList a
toDlist = (++)
fromDlist :: DList a -> [a]
fromDlist dl = dl []
So far so good, but how to concatenate two DList a
? We need to take something like
list1 = \end -> 1:2:3:end
list2 = \end -> 4:5:6:7:end
and return
concatenate list1 list2 = \end -> 1:2:3:4:5:6:7:end
It turns out that concatenate
is simply function composition, (.)
. I.e. list1 . list2
is exactly the concatenation we need. When we evaluate
fromDList (list1 . list2)
-- i.e.
(list1 . list2) []
no copy is done, since the end of list1
is immediately linked to list2
.
So, with this in mind, we can rewrite your code with minimal changes:
inorder :: IntegerTree -> [Integer]
inorder tree = fromDList (inorderDL tree)
inorderDL :: IntegerTree -> DList Integer
inorderDL (Leaf n) = (n :) -- that is, \end -> n: end
inorderDL (Node l n r) = inorderDL l . (n :) . inorderDL r
This will make no list copies, since when each sublist is generated at each recursive step, its tail won't be []
, but the "pointer" to the next list to concatenate.
Finally, note that this approach, under a different light, is exactly what Willem used in his answer.