Finding the duplicate elements from this list in minimum number of time

Question

I was in an interview and there was a question asked that there is a List which has integers numbers in it. List has the size of 1 million records and all of them are integers.

My task is to find the duplicates with best runtime. I was not able to answer the correct answer as I have told him that i can achieve the same of finding the duplicate numbers in such a big list by making use of two loops but can this be done as fast as possible with best runtime?

Welcome to SO. Your question is hard to read. Consider rewriting it in a more structured fashion. — Neuron, Apr 12 '18 at 15:10
Welcome to Stack Overflow. Your question would benefit from showing the code that you wrote - with two nested loops. Using proper punctuation would be of great help as well. Also note that not too many people outside India would tell you right away how many items are in "10 lakhs," while "one million" enjoys instant familiarity. — Sergey Kalinichenko, Apr 12 '18 at 15:13
If you always want one item of every object in the list you can use Set instead of list.otherwise If your list is populated with you then you can create a class which saves count of each item in a `HashMap` when adding new items to list — ygngy, Apr 12 '18 at 15:17

score 1 · Answer 1 · answered Apr 12 '18 at 15:52

You need to iterate over the integers and keep track of integers you've already seen. For this you need an efficient data structure which have good runtime complexity for add and contains operations.

For instance you can use a has set to track seen integers:

    Set<Integer> duplicateIntegers = new LinkedHashSet<>();
    Set<Integer> seenIntegers = new HashSet<>();

    for (Integer integer : integers) {
        if (!seenIntegers.add(integer)){
            duplicateIntegers.add(integer);
        }
    }

Here we iterate over N integers, add it to seenIntegers and check if current integer is already there, which is amortized O(1). So at the end it is O(N) on time and O(N) on additional space.

However O(1) for HashSet.add is amortised (see here what it actually means). Since we deal with integers and they are not so many, we can achieve a honest-to-goodness O(1) using more additional space. We only need 2^32 bits which is just 512Mb. For this we can use BitSet. Actually, two BitSets because we need 2^32 bits but BitSet can only be initialized with max value of int which is 2^31-1.

    BitSet seenNonNegativeIntegers = new BitSet(Integer.MAX_VALUE);
    BitSet seenNegativeIntegers = new BitSet(Integer.MAX_VALUE);

    Set<Integer> duplicateIntegers = new LinkedHashSet<>();

    for (Integer integer : integers) {
        int i = integer.intValue();
        if (i >= 0) {
            if (seenNonNegativeIntegers.get(i)) {
                duplicateIntegers.add(integer);
            }
            seenNonNegativeIntegers.set(i);
        } else if (i < 0) {
            int index = -(i + 1);
            if (seenNegativeIntegers.get(index)) {
                duplicateIntegers.add(integer);
            }
            seenNegativeIntegers.set(index);
        }
    }

This is also O(N), but based on honest-not-amortised O(1). Theoretically this must be the most optimal solution from the runtime complexity point of view. Practically, however it might still be slower that HashSet because we have to do get and set instead of just one add.

On an interview I'd probably present the first solution, mentioning the second one and discussing runtime complexity vs. additional space requirements.

score 0 · Answer 2 · answered Apr 12 '18 at 15:19

0

The easiest solution is to use HasMap, unordered_set.

def find_duplicates(a): used = set() yielded = set() for x in a: if x in used and x not in yielded: yield x yielded.add(x) used.add(x)

answered Apr 12 '18 at 15:19

Anton Hulikau

336
1
3
7

ygngy · Answer 3 · 2018-04-12T16:00:28.483

0

public class CountArrayList extends ArrayList<YourType>{

       private HashMap<YourType, Integer> count = new HashMap<>();

       @Override
       public boolean add(YourType element){
             Integer i = count.get(element);
             count.put(element, i == null ? 1 : ++i);
             return super.add(element);
       }

       public int getItemCount(YourType element){
             return count.get(element) == null ? 0 : count.get(element);
       }
}

This class is not complete and you should override remove and other methods like add method to update count

edited Apr 12 '18 at 16:00

answered Apr 12 '18 at 15:53

ygngy

3,630
2
18
29

What is `YourType`? Why do you need to extend `ArrayList`? – lexicore Apr 12 '18 at 15:55
@lexicore YourType is the type of items which you add to list, you extend ArrayList to use its other methods – ygngy Apr 12 '18 at 15:57
Here's a suggestion (I absolutely do not insist). If you want, I can explain why I think this is not a good solution on an interview. Let me know if you're interested. – lexicore Apr 12 '18 at 16:01
@lexicore yes elements with one value and diffierent reference – ygngy Apr 12 '18 at 16:02
OK. First, `YourType` is completely unnecessary as it is clearly stated that items are integers. – lexicore Apr 12 '18 at 16:03
Next, you actually do not produce the requested result (duplicate numbers). – lexicore Apr 12 '18 at 16:04
@lexicore duplicate numbers are numbers with more than one count – ygngy Apr 12 '18 at 16:05
@lexicore YourType can be changed its a place holder in my code – ygngy Apr 12 '18 at 16:06
Next, you seem to have solved a different task: with you code one can find out number of occurences of each item. Which was actually not requested. – lexicore Apr 12 '18 at 16:06
Finally, you have subclassed `ArrayList` which is also totally unnecessary. – lexicore Apr 12 '18 at 16:07
So overall my impression is that you surely know how to solve it, but did not take the requirements too precisely. – lexicore Apr 12 '18 at 16:09
@lexicore as you can see this question has #data-structures TAG which is about data structures – ygngy Apr 12 '18 at 16:09
@lexicore duplicate numbers are numbers with more than one count, YourType can be changed its a place holder in my code, as you can see this question has #data-structures TAG which is about data structures – ygngy Apr 12 '18 at 16:11
Let's move to chat? – lexicore Apr 12 '18 at 16:12
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/168856/discussion-between-wizard-and-lexicore). – ygngy Apr 12 '18 at 16:12

Finding the duplicate elements from this list in minimum number of time

3 Answers3