0

Possible Duplicate:
Java Compare Two Lists

If I have two ordered sequences of numbers (I have a lot of flexibility here, and could put my data in a list, set, array, etc.), what's the most efficient way to go about extracting the matches? For example, if I have:

[1, 2, 4, 6, 9] and [2, 3, 4], I would like to return [2, 4].

There are obviously lots of ways to go about this; I'm curious what the most efficient way is.

Community
  • 1
  • 1
Adam_G
  • 7,337
  • 20
  • 86
  • 148
  • 3
    Here are the same problems http://stackoverflow.com/questions/7574311/efficiently-compute-intersection-of-two-sets-in-java http://stackoverflow.com/questions/2400838/efficient-intersection-of-two-liststring-in-java – KrHubert Sep 13 '12 at 13:24
  • 1
    Can sequences contain duplicate numbers? – Alexei Kaigorodov Sep 13 '12 at 13:25
  • 1
    Check this question http://stackoverflow.com/questions/2762093/java-compare-two-lists?lq=1 – Stephan Sep 13 '12 at 13:26

6 Answers6

10

Two pointers (indexes), one for each list. Start from the beginning of each list

Whenever you find a match, store it.

If you don't have a match, check which list has the lowest value in the current index and increase the index for that list. Check for a match. Do that until you arrive to the end of one of the lists.

Runs linear to the size of the sum of the lists (worse case). No way to do it better.

vainolo
  • 6,907
  • 4
  • 24
  • 47
4

So you want the intersection of the to sets? You can use CollectionUtils.intersection from apache commons. Or better, the one from Guava Sets.intersection (which is generic).

zeller
  • 4,904
  • 2
  • 22
  • 40
2

This is intersection of two collections I guess Set would be best choice.

Please see Collection.retainAll()

Retains only the elements in this collection that are contained in the specified collection (optional operation). In other words, removes from this collection all of its elements that are not contained in the specified collection.

A more efficient way would be

public static void main(String[] args) {
    int[] firstArray = { 1, 3, 5, 8, 10 };
    int[] secondArary = { 3, 8 };
    //Arrays.sort(firstArray); If arrays needs to be sorted
    //Arrays.sort(secondArary);
    System.out.println(Arrays
            .toString(intersection(firstArray, secondArary)));

}

private static Integer[] intersection(int firstArray[], int secondArary[]) {
    int i = 0;
    int j = 0;
    ArrayList<Integer> list = new ArrayList<Integer>();
    while (i < firstArray.length && j < secondArary.length) {
        if (firstArray[i] < secondArary[j])
            i++;// Increase I move to next element
        else if (secondArary[j] < firstArray[i])
            j++;// Increase J move to next element
        else {
            list.add(secondArary[j++]);
            i++;// If same increase I & J both
        }
    }
    return list.toArray(new Integer[0]);
}
Amit Deshpande
  • 19,001
  • 4
  • 46
  • 72
2

If you store the two sequences into Sets you can use Set.retainAll() which:

Retains only the elements in this set that are contained in the specified collection (optional operation). In other words, removes from this set all of its elements that are not contained in the specified collection. If the specified collection is also a set, this operation effectively modifies this set so that its value is the intersection of the two sets.

For example:

Set<Integer> s1 = new TreeSet<Integer>();
s1.add(1);
s1.add(2);
s1.add(6);

Set<Integer> s2 = new TreeSet<Integer>();
s2.add(2);
s2.add(6);

s2.retainAll(s1);

for (Integer i: s2)
{
    System.out.println(i);
}

Output:

2
6
hmjd
  • 120,187
  • 20
  • 207
  • 252
  • 1
    That will not run in linear time. It will iterate over the target set, search in the original set (order log(n) for a treeset), and remove one by one. – GeertPt Sep 13 '12 at 13:30
  • @greyfairer The first set does not need to be a treeset, it can be a hashset, which gives you an O(1) on contains. – assylias Sep 13 '12 at 13:33
  • @assylias But my observation is that if it is for short set of array more time is required to construct `Hashset` it self. – Amit Deshpande Sep 13 '12 at 13:45
1

I would create the two lists as java.util.Set, copy one set to a temporary variable and call retainAll(<otherSet>) on this temporary set. Then temporary set will be left the intersection of two sets.

Vikdor
  • 23,934
  • 10
  • 61
  • 84
  • That will not run in linear time. It will iterate over the target set, search in the original set (by hashcode for a HashSet), and remove one by one. – GeertPt Sep 13 '12 at 13:30
0

I would use TreeLists and do the following:

TreeList<Integer> list1 = ...
TreeList<Integer> list2 = ...

Iterable<Integer> result = Iterables.filter(list1, Predicates.in(list2));

This uses Guava's Iterables and Predicates.

John B
  • 32,493
  • 6
  • 77
  • 98