0

The exercise 2.2 in Purely Functional Data Structure says the original member function performs about 2d comparisons in the worst case, where d is the depth of the tree. According to this paper, the author suggests to rewrite member function to take no more than d + 1 comparisons by keeping track of a candidate element that might be equal to the query element (say, the last element for which < returns false or <= returns true) and checking for equality only when the bottom of the tree is hit.

The following are the codes:

data Tree a = Leaf 
            | Fork (Tree a) a (Tree a)
            deriving (Show, Eq)

-- The original member function.
member :: Ord a => a -> Tree a -> Tree a
member _ Leaf = False
member x (Fork l y r)
  | x < y = member x l
  | x > y = member x r
  | otherwise = True

-- The improved member function that I write.
member' :: Ord a => a -> Tree a -> Tree a
member' v t = loop Nothing v t
  where
    loop candidate _ Leaf = Just x == candidate
    loop candidate x (Fork l y r)
      | x < y = loop candidate x l
      | otherwise = loop (Just y) x r

Now my questions are:

  1. According to the paper, is my improved member function right?

  2. In the worst case, the improved member is more efficient than the original one, since it compares no more than d + 1 times. But according to the algorithm, the improved member will take d + 1 comparisons for every query. For the original member, if the queried value equals the root value, it only compaires two times, whereas the improved member seems to still take d + 1 comparisons. So it seems the improved algorithm is not as efficient as the original one in the best case. Do I misunderstand anything here? Actually the paper about the algorithm reads:

When the searched-for node is reached the algorithm does not terminate but continues in the right subtree. Since all elements in that subtree are larger than the one searched for, the algorithm will make no more right turns after passing that node. Thus if the element searched for is to be found in the tree it has to be in the node where the last right turn was made, which is the node referred to by candidate.

So it seems that's how I understand the algorithm.

  1. If point 2 is true, then how to improve the algorithm so that it doesn't need to compare d + 1 times in the best case besides using the three-way comparison functon compare in Haskell?

Thanks for any information!

Z-Y.L
  • 1,740
  • 1
  • 11
  • 15

1 Answers1

0

This is the BST equivalent for how I do binary search: How can I simplify this working Binary Search code in C?

It's most simply described as "find the position where it belongs, then check to see if it's there".

You are right that it's not efficient as the 3-way compare in the best case, but the best case is very unlikely. In the average hit case, the 3-way procedure requires 2(d-1) compares, and in all miss cases it requires 2d.

None of the comparisons in either algorithm are redundant, however, so no case in either one can be improved without making some other case worse.

Given a full tree of depth d with unique keys, consider that both algorithms will perform a sequence of binary comparisons that will either identify the target node (2d-1 possibilities) or the position where it would be inserted (2d possibilities). That's 2d+1-1 possible answers.

Think about generating a Huffman encoding for the possible answers: https://en.wikipedia.org/wiki/Huffman_coding

The comparison results are a sequence of bits that identify the correct answer from the set of possibilities. If all the answers are equally likely, then the optimal encoding takes (d+1) bits regardless of which answer is found.

All other encodings are worse on average. (Exactly how much worse can be determined by the Kullback-Leibler divergence, if you're interested: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87