1

I have two Arraylists, A and B.

ArrayList A consists of classes that consist of a set of data, including an identifier called categoryID. Multiple items in A can have the same categoryID. CategoryID's can look like this for each item in A: [1, 1, 2, 2, 3, 4, 7].

ArrayList B consists of different classes that contain a different set of data, including categoryID. categoryID is unique for each item in this list. Example: [1, 2, 3, 4, 5, 6, 7].

Both lists are sorted by categoryID, which hopefully makes this easier.

What I am trying to do is come up with a new list, C, which consists of items from listB that have at least one intersection with listA. So list C should contain the items [1, 2, 3, 4, 7] from the given input above.

So far, my strategy is to iterate over both lists. I don't believe this is the most efficient way to do this, so I'm asking what other alternatives I can look at are.

My method:

ArrayList<classB> results = new ArrayList<classB>();
for (classA itemA : listA){
  int categoryID = item.categoryID;
  for (classB itemB : listB){
    if (itemB.categoryID == categoryID){
      if (!results.contains(itemB)){
        results.add(itemB);
      }
      break;
    }
  }
}

I'm first traversing list A, grabbing the categoryID, then traversing listB to find the matching categoryID. When I find it, I check if the result list contains this item from listB. If it does not, then I add it to results and break out of the inner for loop and keep going through listA. If the result list already contains itemB, then I will simply break out of the inner for loop and keep going through listA. This method is O(n^2), which is not very nice for large data sets. Are there any ideas to improve?

h_k
  • 1,674
  • 1
  • 24
  • 46

6 Answers6

3

Add all the categoryIDs from ListA to a Set, let's call it setACategories. Afterwards, loop through ListB, if setACategories contains the categoryID of an element from ListB then add that element of ListB to results.

results should also be a Set, because it looks like you only want one match from listB to go into results and not multiple matches (allows you to avoid the call to (!results.contains(itemB)).

NESPowerGlove
  • 5,496
  • 17
  • 28
  • No need to put the second array into a set. – CPerkins Jan 28 '15 at 15:48
  • How efficient would this be? I'd need to iterate over listA to add the elements to a set. Then I'd need to iterate over listB to check for matching categoryIDs. So O(n)? – h_k Jan 28 '15 at 15:50
  • @h_k Well, I suppose now that depends do categoryIDs map 1 to 1 to classB or classA, and if they are of the same parent class (I assumed they were, if not, my answer my not be useful). – NESPowerGlove Jan 28 '15 at 15:54
  • @NESPowerGlove classB's categoryID's have a 1-to-many relationship with classA, if that clarifies things. – h_k Jan 28 '15 at 15:57
  • @h_k Okay let me edit my answer to fit what I think you need. – NESPowerGlove Jan 28 '15 at 15:57
  • @h_k Okay I made the edit, I think this is what you want. O(n) to make the set of categories, another O(n) to produce the results set. – NESPowerGlove Jan 28 '15 at 16:02
1

Add the categoryID values from listA into a Set, and then iterate over listB, picking those elements whose categoryId is in your set.

CPerkins
  • 8,968
  • 3
  • 34
  • 47
1

The best way right now would be to use a java stream:

List<foo> list1 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<foo> list2 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
list1.stream().filter(f -> list2.contains(f)).collect(Collectors.toList());

However, I myself use the apache commons library for this sort of stuff:

https://commons.apache.org/proper/commons-collections/javadocs/api-3.2.1/org/apache/commons/collections/CollectionUtils.html

Games Brainiac
  • 80,178
  • 33
  • 141
  • 199
1

Have you tried:

public void test() {
    Collection c1 = new ArrayList();
    Collection c2 = new ArrayList();

    c1.add("Text 1");
    c1.add("Text 2");
    c1.add("Text 3");
    c1.add("Text 4");
    c1.add("Text 5");

    c2.add("Text 3");
    c2.add("Text 4");
    c2.add("Text 5");
    c2.add("Text 6");
    c2.add("Text 7");

    c1.retainAll(c2);

    for (Iterator iterator = c1.iterator(); iterator.hasNext();) {
        Object next = iterator.next();
        System.out.println(next);  //Output: Text 3, Text 4, Text 5
    }
}
Andie2302
  • 4,825
  • 4
  • 24
  • 43
  • But not with the null initializations . You should remove those, they only mislead. Otherwise +1 for the best answer yet. – user207421 Jan 28 '15 at 18:17
0

Try using Sets.intersection(Set<E> set1,Set<?> set2) from Google Guava.

Off course you can transform arrays to sets with Sets.newHashSet(Iterable<? extends E> elements)

MAGx2
  • 3,149
  • 7
  • 33
  • 63
0

See the following code. I have implemented a intersection that uses the fact that they are sorted to improve upon the top answer's method.

It kind of works like the merge step in merge sort, except it ensures intersections. It can probably be improved further, I wrote it up in 30 minutes.

With the current data, it runs about 17x faster than the top answer. It also saves O(n) memory, as it only requires one set

Also see: The intersection of two sorted arrays

import java.util.*;

public class test {
    public static void main (String[] args) {
        List<Integer> a1 = new ArrayList<Integer>();
        List<Integer> a2 = new ArrayList<Integer>();
        Random r = new Random();

        for(int i = 0; i < 1000000; i++) {
            a1.add(r.nextInt(1000000));
            a2.add(r.nextInt(1000000));
        }

        Collections.sort(a1);
        Collections.sort(a2);

        System.out.println("Starting");

        long t1 = System.currentTimeMillis();
        Set<Integer> set1 = func1(a1, a2);
        long t2 = System.currentTimeMillis();

        System.out.println("Func1 done in: " + (t2-t1) + " milliseconds.");

        long t3 = System.currentTimeMillis();
        Set<Integer> set2 = func2(a1, a2);
        long t4 = System.currentTimeMillis();

        System.out.println("Func2 done in: " + (t4-t3) + " milliseconds.");

        if(set1.size() != set2.size()) {
            System.out.println("ERROR - sizes not equal");
            System.exit(1);
        }

        for(Integer t : set1) {
            if (!set2.contains(t)) {
                System.out.println("ERROR");
                System.exit(1);
            }
        }
    }

    public static Set<Integer> func1(List<Integer> a1, List<Integer> a2) {
        Set<Integer> intersection = new HashSet<Integer>();

        int index = 0;
        for(Integer a : a1) {

            while( index < a2.size() && a2.get(index) < a) {
                index++;
            } 

            if(index == a2.size()) { 
                break;
            }
            if (a2.get(index).equals(a)) {
                intersection.add(a);
            } else {
                continue;
            }

        }

        return intersection;
    }

    public static Set<Integer> func2(List<Integer> a1, List<Integer> a2) {
        Set<Integer> intersection = new HashSet<Integer>();
        Set<Integer> tempSet = new HashSet<Integer>();
        for(Integer a : a1) {
            tempSet.add(a);
        }

        for(Integer b : a2) {
            if(tempSet.contains(b)) {
                intersection.add(b);
            }
        }

        return intersection;
    }
}
Community
  • 1
  • 1
Andrew
  • 1,355
  • 2
  • 13
  • 28