The exercise 2.2 in Purely Functional Data Structure says the original member
function performs about 2d
comparisons in the worst case, where d
is the depth of the tree. According to this paper, the author suggests to rewrite member
function to take no more than d + 1
comparisons by keeping track of a candidate element that might be equal to the query element (say, the last element for which < returns false or <= returns true) and checking for equality only when the bottom of the tree is hit.
The following are the codes:
data Tree a = Leaf
| Fork (Tree a) a (Tree a)
deriving (Show, Eq)
-- The original member function.
member :: Ord a => a -> Tree a -> Tree a
member _ Leaf = False
member x (Fork l y r)
| x < y = member x l
| x > y = member x r
| otherwise = True
-- The improved member function that I write.
member' :: Ord a => a -> Tree a -> Tree a
member' v t = loop Nothing v t
where
loop candidate _ Leaf = Just x == candidate
loop candidate x (Fork l y r)
| x < y = loop candidate x l
| otherwise = loop (Just y) x r
Now my questions are:
According to the paper, is my improved
member
function right?In the worst case, the improved
member
is more efficient than the original one, since it compares no more thand + 1
times. But according to the algorithm, the improvedmember
will taked + 1
comparisons for every query. For the originalmember
, if the queried value equals the root value, it only compaires two times, whereas the improvedmember
seems to still taked + 1
comparisons. So it seems the improved algorithm is not as efficient as the original one in the best case. Do I misunderstand anything here? Actually the paper about the algorithm reads:
When the searched-for node is reached the algorithm does not terminate but continues in the right subtree. Since all elements in that subtree are larger than the one searched for, the algorithm will make no more right turns after passing that node. Thus if the element searched for is to be found in the tree it has to be in the node where the last right turn was made, which is the node referred to by candidate.
So it seems that's how I understand the algorithm.
- If point 2 is true, then how to improve the algorithm so that it doesn't need to compare
d + 1
times in the best case besides using the three-way comparison functoncompare
in Haskell?
Thanks for any information!