11

In Java, what's the most efficient way to return the common elements from two String Arrays? I can do it with a pair of for loops, but that doesn't seem to be very efficient. The best I could come up with was converting to a List and then applying retainAll, based on my review of a similar SO question:

List<String> compareList = Arrays.asList(strArr1);
List<String> baseList = Arrays.asList(strArr2);
baseList.retainAll(compareList);
Community
  • 1
  • 1
JW8
  • 1,496
  • 5
  • 21
  • 36

6 Answers6

6

EDITED:

This is a one-liner:

compareList.retainAll(new HashSet<String>(baseList));

The retainAll impl (in AbstractCollection) iterates over this, and uses contains() on the argument. Turning the argument into a HashSet will result in fast lookups, so the loop within the retainAll will execute as quickly as possible.

Also, the name baseList hints at it being a constant, so you will get a significant performance improvement if you cache this:

static final Set<String> BASE = Collections.unmodifiableSet(new HashSet<String>(Arrays.asList("one", "two", "three", "etc")));

static void retainCommonWithBase(Collection<String> strings) {
    strings.retainAll(BASE);
}

If you want to preserve the original List, do this:

static List<String> retainCommonWithBase(List<String> strings) {
   List<String> result = new ArrayList<String>(strings);
   result.retainAll(BASE);
   return result;
}
Community
  • 1
  • 1
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • 1
    retainAll seems to iterate on the set, and does not lookop on the set (which is kinda weird, it could have been overriden for all hash based collections) – Sebastien Lorber Dec 19 '11 at 00:47
  • @SebastienLorber Thanks for pointing that out. I ave incorporated your comment into my edit – Bohemian Dec 19 '11 at 01:58
3

I would use HashSets (and retainAll) then, which would make the whole check O(n) (for each element in the first set lookup if it exists (contains()), which is O(1) for HashSet). Lists are faster to create though (HashSet might have to deal with collisions...).

Keep in mind that Set and List have different semantics (lists allow duplicate elements, nulls...).

milan
  • 11,872
  • 3
  • 42
  • 49
3

Sort both arrays.

Once sorted, you can iterate both sorted arrays exactly once, using two indexes.

This will be O(NlogN).

Burleigh Bear
  • 3,274
  • 22
  • 32
  • @Burleigh How do you arrive at n log n? – non sequitor Dec 19 '11 at 02:24
  • More or less by abusing the notation :). To sort an array is NlogN in the length of the array. Asymptotically, we only need to consider the longer array - let's call that N (and let's assume that comparisons between strings are fixed cost, which is also not true). So the sort stage is O(NLogN). To find the common elements, we can traverse the arrays in order, only so that's O(N), again assuming that the comparisons are fixed cost. I guess it's more accurate to say that the order is O(MNlogN) where M is the length of the longest string in either array. – Burleigh Bear Dec 19 '11 at 02:40
1

retain all is not supported by list. use set instead:

import java.util.*;
public class Main {
    public static void main(String[] args) {
        String[] strings1={"a","b","b","c"},strings2={"b","c","c","d"};
        List<String> list=Arrays.asList(strings1);
        //list.retainAll(Arrays.asList(strings2)); // throws UnsupportedOperationException
        //System.out.println(list);
        Set<String> set=new LinkedHashSet<String>(Arrays.asList(strings1));
        set.retainAll(Arrays.asList(strings2));
        System.out.println(set);
    }
}
Ray Tayek
  • 9,841
  • 8
  • 50
  • 90
1

What you want is called intersection. See that: Intersection and union of ArrayLists in Java

The use of an Hash based collection provides a really faster contains() method, particularly on strings which have an optimized hashcode.


If you can import libraries you can consider using the Sets.intersection of Guava.


Edit:

Didn't know about the retainAll method.

Note that the AbstractCollection implementation, which seems not overriden for HashSets and LinkedHashSets is:

public boolean retainAll(Collection c) { boolean modified = false; Iterator it = iterator(); while (it.hasNext()) { if (!c.contains(it.next())) { it.remove(); modified = true; } } return modified; }

Which means you call contains() on the collection parameter! Which means if you pass a List parameter you will have an equals call on many item of the list, for every iteration!

This is why i don't think the above implementations using retainAll are good.

public <T> List<T> intersection(List<T> list1, List<T> list2) {
    boolean firstIsBigger = list1.size() > list2.size();
    List<T> big =  firstIsBigger ? list1:list2;
    Set<T> small =  firstIsBigger ? new HashSet<T>(list2) : new HashSet<T>(list1);
    return big.retainsAll(small)
}

Choosing to use the Set for the smallest list because it's faster to contruct the set, and a big list iterates pretty well...

Notice that one of the original list param may be modified, it's up to you to make a copy...

Community
  • 1
  • 1
Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
0

I had an interview and this question was the thing they asked me during technical interview. My answer was following lines of code:

public static void main(String[] args) {

        String[] temp1 = {"a", "b", "c"};
        String[] temp2 = {"c", "d", "a", "e", "f"};
        String[] temp3 = {"b", "c", "a", "a", "f"};

        ArrayList<String> list1 = new ArrayList<String>(Arrays.asList(temp1));
        System.out.println("list1: " + list1);
        ArrayList<String> list2 = new ArrayList<String>(Arrays.asList(temp2));
        System.out.println("list2: " + list2);
        ArrayList<String> list3 = new ArrayList<String>(Arrays.asList(temp3));
        System.out.println("list3: " + list3);

        list1.retainAll(list2);
        list1.retainAll(list3);
        for (String str : list1)
            System.out.println("Commons: " + str);
}

Output:

list1: [a, b, c]
list2: [c, d, a, e, f]
list3: [b, c, a, a, f]
Commons: a
Commons: c
Hesam
  • 52,260
  • 74
  • 224
  • 365