0

I am studying Data Structures and Algorithms in C#/Java. After encountering a solution to the problem of Linked List duplicate removal, I have been struggling to understand it.

The solution is the one proposed by the renowned book Cracking the coding Interview (5th edition, page 208).

void RemoveDuplicates_HashSet(Node n)
{
    HashSet<object> set = new HashSet<object>();

    Node previous = null;
    while (n != null)
    {
        if (set.Contains(n.Data))       // Condition 1
            previous.Next = n.Next;
        else                            // Condition 2
        {
            set.Add(n.Data);
            previous = n;
        }
            
        n = n.Next;
    }
}

Running the code with the following linked list A->B->A->B:

// Creating test Singly LinkedList
Node n = new Node("A");
n.Next = new Node("B");
n.Next.Next = new Node("A");
n.Next.Next.Next = new Node("B");

RemoveDuplicates_HashSet(n);

Works perfectly fine: the value of n after the method is A->B.

By following the code with a debugger, I can see that what happens in the method loop is the following:

| Pass | HashSet | n          | previous   | Comment                  |
| ---- | ------- | ---------- | ---------- | ------------------------ |
| –    | –       | A->B->A->B | null       |                          |
| 1    | A       | B->A->B    | A->B->A->B | Condition 2 is triggered |
| 2    | A,B     | A->B       | B->A->B    | Condition 2 is triggered |
| 3    | A,B     | B          | B->B       | Condition 1 is triggered |
| 4    | A,B     | null       | B          | Condition 1 is triggered |

I fail to understand how this actually results in several ways:

  1. Where/how exactly are duplicates dropped from n? I understand that HashSet contains only unique elements, and it will therefore detect if an element was encountered already, however I still cannot see how the algorithm works in its entirety.

  2. How is it that the values pointed to by n are updated to be A->B? Where is it that, given that essentially the loop is simply iterating over the Linked List doing n = n.Next, n is actually updated with the final value A->B? I understand that the list is passed by reference, but I cannot see how it is actually modified.

alelom
  • 2,130
  • 3
  • 26
  • 38
  • 1
    This code: `if (set.Contains(n.Data)) previous.Next = n.Next` checks if the element has already been encountered and, if it has, removes `n` from the linked list. It removes the node by assigning `n.Next` to `previous.Next` (which means `previous.Next` no longer points to `n`). – Slaw Jan 12 '21 at 09:54
  • @Slaw thanks, this is exactly the answer I was looking for. You made me realize that, actually, `previous` and `n` point to the same list (I was mistakenly thinking that they were a "copy" of the same list). Therefore, modifying `previous` means to also modify the original list `n`. – alelom Jan 12 '21 at 10:29

2 Answers2

2

Where/how exactly are duplicates dropped from n?

A property of a Set if, that it may not contain duplicate elements.

if (set.Contains(n.Data))
            previous.Next = n.Next;
else
{
            set.Add(n.Data);
            previous = n;
}
n = n.Next;

Here you first check, if the data of the current node n is contained in the set. If yes, the successor of the previous node is the next node of the current node n. So:

[previous]->[n]->[next]
[previous]->[next]

Otherwise, the data is added to the set, and you proceed with the next node.

How is it that the values pointed to by n are updated to be A->B? Where is it that, given that essentially the loop is simply iterating over the Linked List doing n = n.Next, n is actually updated with the final value A->B?

I really don't understand this question. Do you mean "Why is the list modified?" If yes, then because it is passed (a bit simplified ) by reference. So you just dont do a copy, like with primitve types (int,long, ...), but you modify the object itself.

JCWasmx86
  • 3,473
  • 2
  • 11
  • 29
  • > `Do you mean "Why is the list modified?"` yes, and I do understand pass-by-reference to an extent, but apparently I'm still missing something – where is it here that `n` is modified in the first place? – alelom Jan 12 '21 at 10:00
  • It is the same as with the list itself. You change the value of some field of the Node and this will show in the list. – JCWasmx86 Jan 12 '21 at 10:03
0

@Slaw's comment pointed me in what I believe to be the right direction.

This code: if (set.Contains(n.Data)) previous.Next = n.Next checks if the element has already been encountered and, if it has, removes n from the linked list. It removes the node by assigning n.Next to previous.Next (which means previous.Next no longer points to n).

I've therefore tried to exhaustively diagram what happens in the algorithm.

Diagram of the algorithm.

alelom
  • 2,130
  • 3
  • 26
  • 38