efficient way to do 'contains' between two lists

Question

I have 2 lists of integers,

l1 = new ArrayList();
l2 = new ArrayList();

I want to find out duplicate items in both of them, I have my usual approach:-

for (Integer i : l1)
{
 if(l2.contains(i)){
    System.out.println("Found!");
  } 
}

I've heard contains() is O(n), making my implementation O(n^2).

Is there a better way to do this, (less than O(n^2)) ?

Jon Skeet · Accepted Answer · 2015-10-29T14:38:20.367

7

Sure - create a HashSet<Integer> from one of the lists first.

Set<Integer> set = new HashSet<Integer>(l2);
for (Integer i : l1) {
    if (set.contains(i)) {
        System.out.println("Found!");
    }
}

If you want to find all duplicate entries, you don't even need to write your own loop, as Set<E> provides everything you need...

Set<Integer> set = new HashSet<Integer>(l2);
set.retainAll(new HashSet<Integer>(l1));

Afterwards, set will be the intersection of the two lists.

Note that you can be even more efficient than this if both your lists are already sorted to start with. You just iterate over both at the same time, moving the "cursor" forward for whichever iterator currently has the smaller value.

edited Oct 29 '15 at 14:38

answered Aug 08 '11 at 18:29

Jon Skeet

1,421,763
867
9,128
9,194

Shouldn't it be `set.contains(i)`? – dlev Aug 08 '11 at 18:31
@dlev: Yup, fixed. But there's a better approach to find the whole intersection - see my edited version... – Jon Skeet Aug 08 '11 at 18:32
Very nice. Hadn't seen `retainAll()` before. Assuming he just wants the duplicates (rather than acting on each one) that's way cleaner. – dlev Aug 08 '11 at 18:33
retainAll seems nice, but the previous loops is `o(n^2)` like mine. Can you explain the cursor solution? It seems like it'll be better ( `O(nLogn)` for sorting i think) – heyNow Aug 08 '11 at 18:46
1

@sitsOnRedChair: No, it's not O(n^2) - because a lookup in a `HashSet` is O(1), not O(n). – Jon Skeet Aug 08 '11 at 18:52
If there are n elements in the set then it will do O(n) tests on l1.contains(i), and List.contains is O(n). This is discussed here: http://stackoverflow.com/questions/24754881/what-is-the-time-and-space-complexity-of-method-retainall-when-used-on-hashsets. Am I missing something? – sfjac Oct 29 '15 at 00:53
@sfjac: My code doesn't call l1.contains(i). It calls set.contains(i). – Jon Skeet Oct 29 '15 at 07:24
Was referring to the second snippet- I should have been more precise. `retainAll` will call `set.contains(l1)`. – sfjac Oct 29 '15 at 14:29
@sfjac: Well where do you see the second snippet calling `List.contains`? – Jon Skeet Oct 29 '15 at 14:30
Sorry, should have peeked at my reference. The implementation of `retainAll` loops oner the elements in the set and calls `container.contains(element)` for each element in the set. If `container` is a list then that's O(n). – sfjac Oct 29 '15 at 14:35
1

@sfjac: Ah, I see what you mean. That's easily fixed then, by just creating a new set... will edit. – Jon Skeet Oct 29 '15 at 14:38

score 5 · Answer 2 · answered Aug 08 '11 at 18:29

The usual way is to add each item from the first list to a HashSet, and then test each item in the second list for existence in that set:

Set<Integer> firstSet = new HashSet<Integer>(l1);
for (Integer i : l2) {
    if (firstSet.contains(i)) {
        // Do stuff
    }
}

efficient way to do 'contains' between two lists

2 Answers2

Linked