Strategy to find duplicate entries in a binary search tree

Question

I have a BST which has duplicate entries. I am trying to find duplicate entries. Now obviously I can write a dumb algorithm which traverses the whole tree, which is easy.

However, I want to write a more efficient one. Here's what I've done/thought so far:

Assume the following tree.

      10
     /   \
    5    15
   /\    / \
  2  8   10 16
      \    \
       8   12

If I want to find all 8's, I will first find the 8 on the left subtree of the 10. To find a duplicate, if it has no right child, is it going to be the left-most node on the right-subtree of the the first parent that is larger than that node (8)? And if it did have a right child, then it can be either in the left most node of its right subtree or the right most node on its left subtree?

Are those all the cases, which can be achieved with a bunch of loops and if-statements?

If not, what's a better approach? Can anyone help?

Thanks

EDIT: Actually I just realized it can't be the "left most node" or "right most node". That would find the node that is the next highest value or the previous lowest value. Would it be one node before?

EDIT 2:

Fixed my BST example. It follows the following insertion method:

if (node == null) 
    return new NodeBST<Value>(name, value);

if (node.key().compareTo(name) > 0)
    node.setLeft(insert(node.left(), name, value));     
else
    node.setRight(insert(node.right(), name, value));

This means duplicates will be added to the right of their duplicates.. right?

That's not a proper BST. You can't have 8 on the right branch of the root 10. — John Kugelman, Oct 09 '11 at 23:45
What do you want: 1) given a key, find it and all duplicates 2) find all nodes that have a duplicate key? A related, more generic question: http://stackoverflow.com/questions/300935/are-duplicate-keys-allowed-in-binary-search-trees-and-if-yes-how-to-treat-them — Ciro Santilli OurBigBook.com, Apr 11 '15 at 09:45

score 2 · Answer 1 · edited Apr 11 '15 at 09:46

2

The tree you show assumes (well, at least I assume... ;-)) that less-than are on the left, and greater-than are on the right, am I right?

So there are two things you should consider:

Your tree is wrong! The second "8" on the right of the "10" head cannot be there, since it's less-than 10. A correct insertion, and a correct balancing, would both put in very close, if not right at the "next" iteration from the "left 8".
By defining the tree as "less-than-or-equal" on the left, and "greater-than" on the right, you would get the wanted result: all "8"s will be chained to the left of each other on a simple insertion tree.

edited Apr 11 '15 at 09:46

Ciro Santilli OurBigBook.com

347,512
102
1,199
985

answered Oct 09 '11 at 23:47

Etamar Laron

1,172
10
23

Yes yes I apologize for my mistake. Please see edit 2. My insertion chains them to the right of each other. So when I find a duplicate, simply go through its right child/children to see all duplicates? – darksky Oct 09 '11 at 23:50
absolutely. Also, for brevity and performance, if you expect large number of duplicates, extending a linked-list of all similar-value objects from the first one, may be a good idea. If really large sets of duplicates, you can start a Hash-table for a leaf that contains duplicates... all depending on the particular problem you are trying to solve! – Etamar Laron Oct 15 '11 at 21:15

user207421 · Accepted Answer · 2015-07-14T00:24:31.230

2

Find an element that matches your key using the usual binary tree search algorithm. If not found, stop.
Examine the LH sub-branch. If its key matches, make that the current node and repeat this step.
You are now at the first element in the tree with that key. Now do a tree walk from this node while the keys are equal, i.e. visit this node, the right sub-tree, the parent, the parent's right sub-tree, etc, left as an exercise for the reader.

edited Jul 14 '15 at 00:24

answered Oct 10 '11 at 00:28

user207421

305,947
44
307
483

I believe in step 2 it should be RH sub-branch as according to my insertion (as shown in Edit 2 above), duplicates are inserted to the right child of the node. Correct? – darksky Oct 10 '11 at 01:09
@Nayefc The first time you get to (2) may not know that all the duplicates are only to the right. It depends on how you write your search code. – user207421 Oct 10 '11 at 01:16
Oh yeah I know that. I'm just saying that my insertion method puts duplicates on the right side of a node. (please see above code + diagram). So this wouldn't make step 2 as: `Examine the RH sub-brance` instead of the LH sub-branch? – darksky Oct 10 '11 at 01:22
@Nayefc That doesn't change anything. *When you search,* you either have to keep going left while the keys are equal, or stop at the first match. If the latter, the duplicates could be on both sides. – user207421 Oct 10 '11 at 01:29
Oh you are giving the whole search algorithm. Got it... Thanks – darksky Oct 10 '11 at 01:47
I think the search in fact depends on your implementation i.e. whether your insertion part is like if(element>=tree[i]) insert RH or if(element<=tree[i]) insert LH. – Soumajyoti Sarkar Jan 08 '13 at 19:13
@Soumajyoti No it doesn't. When you encounter a non-leaf match you have to go left to first the first of the equal keys. Otherwise you aren't in a binary tree. – user207421 Oct 03 '13 at 01:06

score 0 · Answer 3 · answered Sep 05 '17 at 06:54

This implementation uses recursive method and return an array of duplicate entries

public class TreeNode<E> {
    public int data;
    public TreeNode left;
    public TreeNode right;
}

public Integer[] findDuplicate(TreeNode tree) {
    Map<Integer, Integer> entries = new HashMap<>();
    List<Integer> duplicates = new LinkedList<>();

    return (Integer[]) findDuplicate(tree, entries, duplicates);
}

private Integer[] findDuplicate(TreeNode tree, Map entries, List duplicates) {
    if (tree == null) 
        return (Integer[]) duplicates.toArray(new Integer[] {});

    if (entries.containsKey(tree.data))
        duplicates.add(tree.data);
    else 
        entries.put((int) tree.data, 1);

    findDuplicate(tree.left, entries, duplicates);
    findDuplicate(tree.right, entries, duplicates);

    return (Integer[]) duplicates.toArray(new Integer[] {});
}

score 0 · Answer 4 · answered Oct 09 '11 at 23:50

0

A recursive algorithm can solve this quickly. You do not have to recurse over the whole tree, as you can use the structure of the BST to hunt down the values you need.

What you have drawn isn't strictly a BST, I might be wrong, but I believe it's pretty broken - all numbers in the left tree should be less than 10, and vice versa.

answered Oct 09 '11 at 23:50

lynks

5,599
6
23
42

I'm sorry, I know that. I did a mistake please see the revised Edit 2. Thanks – darksky Oct 09 '11 at 23:51

Strategy to find duplicate entries in a binary search tree

4 Answers4

Linked