164

Are there any methods to do so? I was looking but couldn't find any.

Another question: I need these methods so I can filter files. Some are AND filters and some are OR filters (like in set theory), so I need to filter according to all files and the unite/intersects ArrayLists that holds those files.

Should I use a different data structure to hold the files? Is there anything else that would offer a better runtime?

informatik01
  • 16,038
  • 10
  • 74
  • 104
yotamoo
  • 5,334
  • 13
  • 49
  • 61
  • 1
    If you didn't want to creayte a new list, Vector.retainAll(Vector) trims your orignal vector to only the intersection with second vector. – user2808054 Jan 16 '15 at 17:50
  • @user2808054 why `Vector`? That class has been discouraged since Java 1.2. – dimo414 Oct 26 '16 at 06:25
  • @dimo414 an interface which I'm using (I have no option) returns things as vectors. I didn't know it had been discouraged ! Thanks for the info .. Discouraged by who ? I haven't seen any note about it being deprecated so this is a surprise – user2808054 Nov 01 '16 at 18:02
  • 1
    From the Javadocs: "[*As of the Java 2 platform v1.2 ... it is recommended to use ArrayList in place of Vector.*](https://docs.oracle.com/javase/8/docs/api/java/util/Vector.html)". The only time you *might* need `Vector` is for cross-thread interactions, but there are safer data structures for those use cases too. See also [this question](http://stackoverflow.com/q/1386275/113632). Any library still using `Vector` in 2016 is very suspect in my opinion. – dimo414 Nov 01 '16 at 20:27
  • @dimo414 it's an IBM library, haha! (Lotus Domino data api). Thanks for the info, very helpful – user2808054 Nov 08 '16 at 16:09

24 Answers24

149

Here's a plain implementation without using any third-party library. Main advantage over retainAll, removeAll and addAll is that these methods don't modify the original lists input to the methods.

public class Test {

    public static void main(String... args) throws Exception {

        List<String> list1 = new ArrayList<String>(Arrays.asList("A", "B", "C"));
        List<String> list2 = new ArrayList<String>(Arrays.asList("B", "C", "D", "E", "F"));

        System.out.println(new Test().intersection(list1, list2));
        System.out.println(new Test().union(list1, list2));
    }

    public <T> List<T> union(List<T> list1, List<T> list2) {
        Set<T> set = new HashSet<T>();

        set.addAll(list1);
        set.addAll(list2);

        return new ArrayList<T>(set);
    }

    public <T> List<T> intersection(List<T> list1, List<T> list2) {
        List<T> list = new ArrayList<T>();

        for (T t : list1) {
            if(list2.contains(t)) {
                list.add(t);
            }
        }

        return list;
    }
}
adarshr
  • 61,315
  • 23
  • 138
  • 167
  • 18
    you can create new list with list1 elements and then call retainAll, addAll methods – lukastymo Mar 12 '11 at 14:44
  • why you using strictfp in this solution? – lukastymo Mar 12 '11 at 14:47
  • 14
    Should use a `HashSet` for `intersection` so that the average case performance is O(n) instead of O(n^2). – Zong Mar 16 '15 at 23:30
  • 2
    This post could use an update to demonstrate the benefits of the Java 8 Stream API. – SME_Dev Sep 02 '15 at 15:57
  • I get error When I try assign this value -> Example : ArrayList total total = (ArrayList) intersection(list2, list1) --->cannot cast java.util.arraylist to java.util.arraylist –  Feb 11 '16 at 09:34
  • Better use the Constructor [`HashSet(int initialCapacity)`](https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html#HashSet-int-). – steffen May 09 '17 at 11:23
  • As @Zong mentioned Set will perform much better especially in larger collections. I added my answer as not only using Set but also with more generic Collection signature. http://stackoverflow.com/a/59476700/2290369 – Ismail Yavuz Dec 25 '19 at 11:02
  • for collection of Objects, it's nice to override the equals() and hashCode() methods for custom objects – Ikhiloya Imokhai May 03 '20 at 00:52
135

Collection (so ArrayList also) have:

col.retainAll(otherCol) // for intersection
col.addAll(otherCol) // for union

Use a List implementation if you accept repetitions, a Set implementation if you don't:

Collection<String> col1 = new ArrayList<String>(); // {a, b, c}
// Collection<String> col1 = new TreeSet<String>();
col1.add("a");
col1.add("b");
col1.add("c");

Collection<String> col2 = new ArrayList<String>(); // {b, c, d, e}
// Collection<String> col2 = new TreeSet<String>();
col2.add("b");
col2.add("c");
col2.add("d");
col2.add("e");

col1.addAll(col2);
System.out.println(col1); 
//output for ArrayList: [a, b, c, b, c, d, e]
//output for TreeSet: [a, b, c, d, e]
lukastymo
  • 26,145
  • 14
  • 53
  • 66
  • 3
    There's been a suggested edit that this union *"is incorrect since it will contain common elements twice"*. The edit recommended to use a `HashSet` instead. – Kos Nov 30 '12 at 17:26
  • 5
    Actually it was edited, see: "Use a List implementation if you accept repetitions, a Set implementation if you don't:" – lukastymo Sep 12 '13 at 10:53
  • 7
    No, retainAll is not intersection for list. In above, all elements in col that are not in otherCol are removed. Let's say otherCol is {a,b,b,c} and col is {b,b,b,c,d}. Then col ends up with {b,b,b,c} which is not the strictly the intersection of the two. I would expect that to be {b,b,c}. A different operation is being performed. – demongolem Mar 18 '16 at 18:02
  • 2
    I also don't see how `addAll()` is union for lists; it's just concatenating the second list onto the end of the first. A union operation would avoid adding an element if the first list already contains it. – dimo414 Oct 26 '16 at 06:23
89

This post is fairly old, but nevertheless it was the first one popping up on google when looking for that topic.

I want to give an update using Java 8 streams doing (basically) the same thing in a single line:

List<T> intersect = list1.stream()
    .filter(list2::contains)
    .collect(Collectors.toList());

List<T> union = Stream.concat(list1.stream(), list2.stream())
    .distinct()
    .collect(Collectors.toList());

If anyone has a better/faster solution let me know, but this solution is a nice one liner that can be easily included in a method without adding a unnecessary helper class/method and still keep the readability.

Madbreaks
  • 19,094
  • 7
  • 58
  • 72
steilerDev
  • 961
  • 7
  • 11
  • 26
    Ooof, it might be a nice one-liner but it takes O(n^2) time. Convert one of the lists to a `Set` then use the set's `contains` method. Not everything in life has to be done with streams. – dimo414 Oct 26 '16 at 06:22
36
list1.retainAll(list2) - is intersection

union will be removeAll and then addAll.

Find more in the documentation of collection(ArrayList is a collection) http://download.oracle.com/javase/1.5.0/docs/api/java/util/Collection.html

gstackoverflow
  • 36,709
  • 117
  • 359
  • 710
The GiG
  • 2,571
  • 2
  • 28
  • 29
  • 2
    Both `retainAll()` and `removeAll()` are O(n^2) operations on lists. We can do better. – dimo414 Oct 26 '16 at 06:24
  • 2
    I up voted but now I have a question. `retainAll` of {1, 2, 2, 3, 4, 5} over {1, 2, 3} results in {1, 2, 2, 3}. Shouldn't it be {1, 2, 3} to be the intersection? – ghchoi Nov 30 '17 at 06:40
  • @ghchoi the semantic behind list and set is now the issue. Using list [1, 2, 2, 3, 4, 5] we accept duplicate but for the set {1, 2, 3} duplicate are not allowed. Also two notation are different in general but not fixed, for list [...duplicate is a feature...] and for set {...not duplicate is allowed...} – Jaja Jun 09 '21 at 13:49
22

Unions and intersections defined only for sets, not lists. As you mentioned.

Check guava library for filters. Also guava provides real intersections and unions

 static <E> Sets.SetView<E >union(Set<? extends E> set1, Set<? extends E> set2)
 static <E> Sets.SetView<E> intersection(Set<E> set1, Set<?> set2)
David
  • 1,920
  • 25
  • 31
Stan Kurilin
  • 15,614
  • 21
  • 81
  • 132
15

You can use CollectionUtils from apache commons.

tomrozb
  • 25,773
  • 31
  • 101
  • 122
bluefoot
  • 10,220
  • 11
  • 43
  • 56
8

The solution marked is not efficient. It has a O(n^2) time complexity. What we can do is to sort both lists, and the execute an intersection algorithm as the one below.

private  static ArrayList<Integer> interesect(ArrayList<Integer> f, ArrayList<Integer> s) { 
    ArrayList<Integer> res = new ArrayList<Integer>();

    int i = 0, j = 0; 
    while (i != f.size() && j != s.size()) { 

        if (f.get(i) < s.get(j)) {
            i ++;
        } else if (f.get(i) > s.get(j)) { 
            j ++;
        } else { 
            res.add(f.get(i)); 
            i ++;  j ++;
        }
    }


    return res; 
}

This one has a complexity of O(n log n + n) which is in O(n log n). The union is done in a similar manner. Just make sure you make the suitable modifications on the if-elseif-else statements.

You can also use iterators if you want (I know they are more efficient in C++, I dont know if this is true in Java as well).

AJed
  • 578
  • 6
  • 11
  • 1
    Not generic enough, T may not be Comparable and in some cases comparing is expensive ... – Boris Churzin Jun 19 '16 at 16:28
  • Not generic, I totally agree. Comparison is expensive? how would you solve that? – AJed Jun 19 '16 at 21:12
  • Sadly - it would be cheaper to do it in O(n^2) :) For Numbers this solution is good... – Boris Churzin Jun 21 '16 at 12:37
  • Sadly - you did not answer my question. Let me rephrase it, how is O(n^2) better given a comparison function of cost c(n)? – AJed Jun 22 '16 at 05:10
  • Why c(n)? Take ArrayList>, it will run at O(n*m)*c(n). – Boris Churzin Jun 22 '16 at 13:45
  • Anyway, the question itself is not correct, you need to define what intersection of lists is. If it's the same as intersection of sets then the better approach would be new List(new Set(list1).retain(new Set(list2))), which is O(n) + O(m) + O(n+m) ~= O(n+m), or even better using SetView in guava... – Boris Churzin Jun 22 '16 at 13:48
  • Guava? retain? - Sadly, you don't make sense. – AJed Jun 23 '16 at 08:30
  • 1
    Converting one input to a set and calling `contains()` in a loop (as Devenv is suggestion) would take O(n + m) time. Sorting is needlessly complicated and takes O(n log n + m log n + n) time. Granted that reduces to O(n log n) time, but that's still worse than linear time, and much more complex. – dimo414 Oct 26 '16 at 06:36
5

One-liners since JAVA 8

Union

if there are no duplicates:

  return concat(a.stream(), b.stream()).collect(toList());

union and distinct:

  return concat(a.stream(), b.stream()).distinct().collect(toList());

union and distinct if Collection/Set return type:

  return concat(a.stream(), b.stream()).collect(toSet());

Intersect

if no duplicates:

  return a.stream().filter(b::contains).collect(toList());

PERFORMANCE: If collection b is huge and not O(1), then pre-optimize the filter performance by adding 1 line before return: Copy to HasSet (import java.util.Set;):

... b = Set.copyOf(b);

intersect and distinct:

  return a.stream().distinct().filter(b::contains).collect(toList());

- imports

import static java.util.stream.Stream.concat;
import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toSet;

epox
  • 9,236
  • 1
  • 55
  • 38
4

Here is a way how you can do an intersection with streams (remember that you have to use java 8 for streams):

List<foo> fooList1 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<foo> fooList2 = new ArrayList<>(Arrays.asList(new foo(), new foo()));
fooList1.stream().filter(f -> fooList2.contains(f)).collect(Collectors.toList());

An example for lists with different types. If you have a realtion between foo and bar and you can get a bar-object from foo than you can modify your stream:

List<foo> fooList = new ArrayList<>(Arrays.asList(new foo(), new foo()));
List<bar> barList = new ArrayList<>(Arrays.asList(new bar(), new bar()));

fooList.stream().filter(f -> barList.contains(f.getBar()).collect(Collectors.toList());
Deutro
  • 3,113
  • 4
  • 18
  • 26
4

You can use commons-collections4 CollectionUtils

Collection<Integer> collection1 = Arrays.asList(1, 2, 4, 5, 7, 8);
Collection<Integer> collection2 = Arrays.asList(2, 3, 4, 6, 8);

Collection<Integer> intersection = CollectionUtils.intersection(collection1, collection2);
System.out.println(intersection); // [2, 4, 8]

Collection<Integer> union = CollectionUtils.union(collection1, collection2);
System.out.println(union); // [1, 2, 3, 4, 5, 6, 7, 8]

Collection<Integer> subtract = CollectionUtils.subtract(collection1, collection2);
System.out.println(subtract); // [1, 5, 7]
xxg
  • 2,048
  • 1
  • 11
  • 14
4

I think you should use a Set to hold the files if you want to do intersection and union on them. Then you can use Guava's Sets class to do union, intersection and filtering by a Predicate as well. The difference between these methods and the other suggestions is that all of these methods create lazy views of the union, intersection, etc. of the two sets. Apache Commons creates a new collection and copies data to it. retainAll changes one of your collections by removing elements from it.

ColinD
  • 108,630
  • 30
  • 201
  • 202
3
  • retainAll will modify your list
  • Guava doesn't have APIs for List (only for set)

I found ListUtils very useful for this use case.

Use ListUtils from org.apache.commons.collections if you do not want to modify existing list.

ListUtils.intersection(list1, list2)

Bala
  • 4,427
  • 6
  • 26
  • 29
2

In Java 8, I use simple helper methods like this:

public static <T> Collection<T> getIntersection(Collection<T> coll1, Collection<T> coll2){
    return Stream.concat(coll1.stream(), coll2.stream())
            .filter(coll1::contains)
            .filter(coll2::contains)
            .collect(Collectors.toSet());
}

public static <T> Collection<T> getMinus(Collection<T> coll1, Collection<T> coll2){
    return coll1.stream().filter(not(coll2::contains)).collect(Collectors.toSet());
}

public static <T> Predicate<T> not(Predicate<T> t) {
    return t.negate();
}
Pascalius
  • 14,024
  • 4
  • 40
  • 38
1

If the objects in the list are hashable (i.e. have a decent hashCode and equals function), the fastest approach between tables approx. size > 20 is to construct a HashSet for the larger of the two lists.

public static <T> ArrayList<T> intersection(Collection<T> a, Collection<T> b) {
    if (b.size() > a.size()) {
        return intersection(b, a);
    } else {
        if (b.size() > 20 && !(a instanceof HashSet)) {
            a = new HashSet(a);
        }
        ArrayList<T> result = new ArrayList();
        for (T objb : b) {
            if (a.contains(objb)) {
                result.add(objb);
            }
        }
        return result;
    }
}
Jeroen Vuurens
  • 1,171
  • 1
  • 9
  • 10
1

I was also working on the similar situation and reached here searching for help. Ended up finding my own solution for Arrays. ArrayList AbsentDates = new ArrayList(); // Will Store Array1-Array2

Note : Posting this if it can help someone reaching this page for help.

ArrayList<String> AbsentDates = new ArrayList<String>();//This Array will store difference
      public void AbsentDays() {
            findDates("April", "2017");//Array one with dates in Month April 2017
            findPresentDays();//Array two carrying some dates which are subset of Dates in Month April 2017

            for (int i = 0; i < Dates.size(); i++) {

                for (int j = 0; j < PresentDates.size(); j++) {

                    if (Dates.get(i).equals(PresentDates.get(j))) {

                        Dates.remove(i);
                    }               

                }              
                AbsentDates = Dates;   
            }
            System.out.println(AbsentDates );
        }
1

Intersection of two list of different object based on common key - Java 8

 private List<User> intersection(List<User> users, List<OtherUser> list) {

        return list.stream()
                .flatMap(OtherUser -> users.stream()
                        .filter(user -> user.getId()
                                .equalsIgnoreCase(OtherUser.getId())))
                .collect(Collectors.toList());
    }
Niraj Sonawane
  • 10,225
  • 10
  • 75
  • 104
1
public static <T> Set<T> intersectCollections(Collection<T> col1, Collection<T> col2) {
    Set<T> set1, set2;
    if (col1 instanceof Set) {
        set1 = (Set) col1;
    } else {
        set1 = new HashSet<>(col1);
    }

    if (col2 instanceof Set) {
        set2 = (Set) col2;
    } else {
        set2 = new HashSet<>(col2);
    }

    Set<T> intersection = new HashSet<>(Math.min(set1.size(), set2.size()));

    for (T t : set1) {
        if (set2.contains(t)) {
            intersection.add(t);
        }
    }

    return intersection;
}

JDK8+ (Probably Best Performance)

public static <T> Set<T> intersectCollections(Collection<T> col1, Collection<T> col2) {
    boolean isCol1Larger = col1.size() > col2.size();
    Set<T> largerSet;
    Collection<T> smallerCol;

    if (isCol1Larger) {
        if (col1 instanceof Set) {
            largerSet = (Set<T>) col1;
        } else {
            largerSet = new HashSet<>(col1);
        }
        smallerCol = col2;
    } else {
        if (col2 instanceof Set) {
            largerSet = (Set<T>) col2;
        } else {
            largerSet = new HashSet<>(col2);
        }
        smallerCol = col1;
    }

    return smallerCol.stream()
            .filter(largerSet::contains)
            .collect(Collectors.toSet());
}

If you don't care about performance and prefer smaller code just use:

col1.stream().filter(col2::contains).collect(Collectors.toList());
Ismail Yavuz
  • 6,727
  • 6
  • 29
  • 50
0

First, I am copying all values of arrays into a single array then I am removing duplicates values into the array. Line 12, explaining if same number occur more than time then put some extra garbage value into "j" position. At the end, traverse from start-end and check if same garbage value occur then discard.

public class Union {
public static void main(String[] args){

    int arr1[]={1,3,3,2,4,2,3,3,5,2,1,99};
    int arr2[]={1,3,2,1,3,2,4,6,3,4};
    int arr3[]=new int[arr1.length+arr2.length];

    for(int i=0;i<arr1.length;i++)
        arr3[i]=arr1[i];

    for(int i=0;i<arr2.length;i++)
        arr3[arr1.length+i]=arr2[i];
    System.out.println(Arrays.toString(arr3));

    for(int i=0;i<arr3.length;i++)
    {
        for(int j=i+1;j<arr3.length;j++)
        {
            if(arr3[i]==arr3[j])
                arr3[j]=99999999;          //line  12
        }
    }
    for(int i=0;i<arr3.length;i++)
    {
        if(arr3[i]!=99999999)
            System.out.print(arr3[i]+" ");
    }
}   
}
Ashutosh
  • 1
  • 3
  • 1
    Welcome to Stack Overflow! Please note that the question is about ArrayList. Also, I'm afraid this particular implementation leaves things to be desired. The value 99999999, which is used as a sentinel, might occur in the input. It would be better to use a dynamic structure, like `ArrayList`, to store the result of the union. – S.L. Barth is on codidact.com Jun 07 '17 at 14:32
  • 1
    Please explain the code you've presented instead of just a code answer. – tmarois Jun 07 '17 at 14:36
  • I am just giving a clue you have to put any garbage value – Ashutosh Jun 14 '17 at 15:50
  • I'm glad to see you added an explanation. Unfortunately, the answer itself is still bad. There is no reason to use arrays. You should use a dynamic structure like ArrayList. If (for some reason) you must use arrays, you should consider using an array of `Integer` rather than `int`. Then you can use `null` instead of your "garbage value". "Garbage values" or "sentinel values" are usually a bad idea, because these values may still occur in the input. – S.L. Barth is on codidact.com Jun 15 '17 at 10:31
0

After testing, here is my best intersection approach.

Faster speed compared to pure HashSet Approach. HashSet and HashMap below has similar performance for arrays with more than 1 million records.

As for Java 8 Stream approach, speed is quite slow for array size larger then 10k.

Hope this can help.

public static List<String> hashMapIntersection(List<String> target, List<String> support) {
    List<String> r = new ArrayList<String>();
    Map<String, Integer> map = new HashMap<String, Integer>();
    for (String s : support) {
        map.put(s, 0);
    }
    for (String s : target) {
        if (map.containsKey(s)) {
            r.add(s);
        }
    }
    return r;
}
public static List<String> hashSetIntersection(List<String> a, List<String> b) {
    Long start = System.currentTimeMillis();

    List<String> r = new ArrayList<String>();
    Set<String> set = new HashSet<String>(b);

    for (String s : a) {
        if (set.contains(s)) {
            r.add(s);
        }
    }
    print("intersection:" + r.size() + "-" + String.valueOf(System.currentTimeMillis() - start));
    return r;
}

public static void union(List<String> a, List<String> b) {
    Long start = System.currentTimeMillis();
    Set<String> r= new HashSet<String>(a);
    r.addAll(b);
    print("union:" + r.size() + "-" + String.valueOf(System.currentTimeMillis() - start));
}
Dilabeing
  • 31
  • 6
0

retainAll() method use for finding common element..i.e;intersection list1.retainAll(list2)

0

You can use the methods:

CollectionUtils.containsAny and CollectionUtils.containsAll

from Apache Commons.

Janac Meena
  • 3,203
  • 35
  • 32
-1

Final solution:

//all sorted items from both
public <T> List<T> getListReunion(List<T> list1, List<T> list2) {
    Set<T> set = new HashSet<T>();
    set.addAll(list1);
    set.addAll(list2);
    return new ArrayList<T>(set);
}

//common items from both
public <T> List<T> getListIntersection(List<T> list1, List<T> list2) {
    list1.retainAll(list2);
    return list1;
}

//common items from list1 not present in list2
public <T> List<T> getListDifference(List<T> list1, List<T> list2) {
    list1.removeAll(list2);
    return list1;
}
Choletski
  • 7,074
  • 6
  • 43
  • 64
  • First method doesn't mutate, the next two do, feels inconsistent. What is the point in wrapping `list1.retainAll(list2);` in `getListIntersection`? it was already one line and it may hide that it is mutating the first list because it returns a list – Aldo Canepa Mar 20 '21 at 00:18
-1

If the number matches than I am checking it's occur first time or not with help of "indexOf()" if the number matches first time then print and save into in a string so, that when the next time same number matches then it's won't print because due to "indexOf()" condition will be false.

class Intersection
{
public static void main(String[] args)
 {
  String s="";
    int[] array1 = {1, 2, 5, 5, 8, 9, 7,2,3512451,4,4,5 ,10};
    int[] array2 = {1, 0, 6, 15, 6, 5,4, 1,7, 0,5,4,5,2,3,8,5,3512451};


       for (int i = 0; i < array1.length; i++)
       {
           for (int j = 0; j < array2.length; j++)
           {
               char c=(char)(array1[i]);
               if(array1[i] == (array2[j])&&s.indexOf(c)==-1)
               {    
                System.out.println("Common element is : "+(array1[i]));
                s+=c;
                }
           }
       }    
}

}

Ashutosh
  • 1
  • 3
  • 2
    Don't just post code as an answer, give some little explanation of what are you doing – Brandon Zamudio Jun 07 '17 at 14:12
  • it's my first program which i uploaded – Ashutosh Jun 07 '17 at 14:25
  • 2
    Although this code may help to solve the problem, it doesn't explain _why_ and/or _how_ it answers the question. Providing this additional context would significantly improve its long-term value. Please [edit] your answer to add explanation, including what limitations and assumptions apply. – Toby Speight Jun 07 '17 at 19:57
-1

If you had your data in Sets you could use Guava's Sets class.

dimo414
  • 47,227
  • 18
  • 148
  • 244
Neil
  • 1,754
  • 2
  • 17
  • 30