Is this an efficient way to remove duplicates from a linked list?

Question

I am writing a function that will take in the head of a linked list, remove all duplicates, and return the new head. I've tested it but I want to see if you can catch any bugs or improve on it.

removeDuplicates(Node head)
    if(head == null) throw new RuntimeException("Invalid linked list");

    Node cur = head.next;
    while(cur != null) {
        if(head.data == cur.data) {
            head = head.next;
        } else {
            Node runner = head;
            while(runner.next != cur) {
                if(runner.next.data == cur.data) {
                    runner.next = runner.next.next;
                    break;
                }
                runner = runner.next;
            }
        cur = cur.next;
    } 
    return head;
}

Is this homework? Or is there some other reason you aren't using the `LinkedList` class? — Ted Hopp, Mar 07 '11 at 07:34

Nico Huysamen · Accepted Answer · 2011-03-07T07:53:34.500

3

If you are willing to spend a little more RAM on the process, you can make it go much faster without changing the structure.

For desktop apps, I normally favor using more RAM and winning some speed. So I would do something like this.

removeDuplicates(Node head) {
    if (head == null) {
        throw new RuntimeException("Invalid List");
    }

    Node current = head;
    Node prev = null;
    Set<T> data = new HashSet<T>(); // where T is the type of your data and assuming it implements the necessary methods to be added to a Set properly.
    while (current != null) {
        if (!data.contains(current.data)) {
            data.add(current.data);
            prev = current;
            current = current.next;
        } else {
            if (prev != null) {
                prev.next = current.next;
                current = current.next;
            }
        }
    }
}

This should run in O(n) time.

EDIT

I hope I was correct in assuming that this is some kind of project / homework where you are being forced to use a linked list, otherwise, as noted, you would be better off using a different data structure.

edited Mar 07 '11 at 07:53

answered Mar 07 '11 at 07:42

Nico Huysamen

10,217
9
62
88

Thanks. I am actually just preparing technical interview questions. This is a good alternative, assuming you are allowed to use an additional data structure. – Brent Mar 07 '11 at 07:59
Ah ok. Yeah, this is quite a nice interview question. Not too difficult, and you need some kind of theoretical background. Nice one! – Nico Huysamen Mar 07 '11 at 08:05
For a small number of elements, the time it takes to allocate memory (depending on the allocation scheme) may exceed the time an in-place modified merge sort would take. – void-pointer Mar 07 '11 at 08:06
@void-pointer - Sorting would change the structure of the linked list, which I assumed should not happen. – Nico Huysamen Mar 07 '11 at 08:40

score 1 · Answer 2 · answered Mar 07 '11 at 07:40

I didn't check your code for bugs, but I do have a suggestion for improving it. Allocate a Hashtable or HashMap that maps Node to Boolean. As you process each element, if it is not a key in the hash, add it (with Boolean.TRUE as the value). If it does exist as a key, then it already appeared in the list and you can simply remove it.

This is faster than your method because hash lookups work in roughly constant time, while you have an inner loop that has to go down the entire remainder of the list for each list element.

Also, you might consider whether using an equals() test instead of == makes better sense for your application.

score 1 · Answer 3 · answered Mar 07 '11 at 07:45

1

To efficiently remove duplicates you should stay away from linked list: Use java.util.PriorityQueue instead; it is a sorted collection for which you can define the sorting-criteria. If you always insert into a sorted collection removing duplicates can be either done directly upon insertion or on-demand with a single O(n)-pass.

answered Mar 07 '11 at 07:45

Bernd Elkemann

23,242
4
37
66

Isn't the problem here to efficiently modify a linked list that already exists, not to define the right data structure to eliminate duplicates? – Ted Hopp Mar 07 '11 at 07:54
It would be faster to just insert directly into the PriorityQueue, however if he does not want to do that, the original poster can still use it to clean his linked list in O(n) from duplicates. – Bernd Elkemann Mar 07 '11 at 08:04

score 0 · Answer 4 · edited May 23 '17 at 12:19

Aside from using the elements of the list to create a hash map and testing each element by using it as a key, which would only be desirable for a large number of elements, where large depends on the resources required to create the hash map, sequentially scanning the list is a practical option, but there are others which will be faster. See user138170's answer here - an in-place merge sort is an O(n log(n)) operation which does not use an extra space, whereas a solution using separately-allocated array would work in O(n) time. Practically, you may want to profile the code and settle for a reasonable value of n, where n is the number of elements in the list, after which a solution allocating memory will be used instead of one which does not.

Edit: If efficiency is really important, I would suggest not using a linked list (largely to preserve cache-coherency) and perhaps using JNI and implementing the function natively; the data would also have to be supplied in a natively-allocated buffer.

My bad, I had edited that out shortly after posting the answer. — void-pointer, Mar 07 '11 at 08:01

Is this an efficient way to remove duplicates from a linked list?

4 Answers4