How do I get the intersection between two arrays as a new array?

Question

I faced this problem many times during various situations. It is generic to all programming languages although I am comfortable with C or Java.

Let us consider two arrays (or collections):

char[] A = {'a', 'b', 'c', 'd'};
char[] B = {'c', 'd', 'e', 'f'};

How do I get the common elements between the two arrays as a new array? In this case, the intersection of array A and B is char[] c = {'c', 'd'}.

I want to avoid the repeated iteration of one array inside the other array which will increase the execution time by (length of A times length of B) which is too much in the case of huge arrays.

Is there any way we could do a single pass in each array to get the common elements?

As stated above, sort the two arrays, and from there it is really easy. — SinisterMJ, Nov 07 '12 at 13:14
Let Us Consider Sorting Cannot Be Implemented In This Case (I am saying so because in most day to day case it will take longer time to sort non-primitive data types or classes which would suppress the purpose of single pass). — Ranjan Sarma, Nov 07 '12 at 13:14
Use HashSet to store all data from first array. Then for every element in the second array, check if the HashSet contains() the element. The complexity of sorting is O(n lg n) while the complexity of this method is O(n) — user1700184, Nov 07 '12 at 13:15
is sorting allowed at all, or do u have to work the array is it is? because this feels like a string searching algorithm question — Moataz Elmasry, Nov 07 '12 at 13:22
Well, I guess sorting is doing several passes over the array, so to me this option should be excluded. — dounyy, Nov 07 '12 at 13:36
@RanjanSarma Sorting, then merging is `O(n lg n)`. Using unsorted arrays is `O(n^2)`. Sorting will be a lot more efficient. — James Kanze, Nov 07 '12 at 14:38
Is the type of the arrays actually `char`? Some of the arguments in comments below could be resolved by putting some restrictions on the types. For example, all the stuff about expected `O(N)` with hash tables goes out the window if you don't have a reasonable hash function for the type (that's why Java encourages you to write one). Conversely with `char` and sufficiently large input, the fastest could be to create an array with one element for each character value (generally 256 or 65536), and use it to record which characters appear in each input. — Steve Jessop, Nov 07 '12 at 15:06
possible duplicate of [Efficient set intersection algorithm](http://stackoverflow.com/questions/497338/efficient-set-intersection-algorithm) — interjay, Nov 07 '12 at 16:00
Do you care about duplicates? IE if A: [1,1,2] and B: [1,2], should the intersection be [1]? — Nicholas, Nov 07 '12 at 16:18
Without a clear definition of what "best" is, I propose this solution for sorting arrays: Print each array item on a slip of paper, and place each slip of paper in a clearly marked envelope. Mail the envelopes to your grandmother with instructions to return only the items that were included in both envelopes, along with some cookies. The solution isn't very fast as it's `O(USPS^grandmother)`, but it's best because cookies are awesome. — zzzzBov, Nov 07 '12 at 17:21
I guess don't understand why you hash AND sort? Your hash isn't guaranteed to be O(n) and the sorting may not be O(n log(n)) where n is the number of elements. If you are not allowed to create any further data types and only compare, you will need to find the most efficient string comparing method and possibly employ tricks as a few have suggested. If you aren't restricting on the creation of more data your best bet is a hash. A binary tree of some kind might another one to do it. — user1701047, Nov 07 '12 at 19:39
@user1700184 what are you talking about? How is querying a HashSet O(1)? — rr-, Sep 08 '15 at 12:52

score 109 · Answer 1 · edited Nov 13 '12 at 20:49

109

foreach element e in array A
    insert e into hash table H

foreach element e in array B
    if H contains e 
        print e

This algorithm is O(N) in time and O(N) in space.

To avoid the extra space, you can use the sorting based approach.

edited Nov 13 '12 at 20:49

Inverse

4,408
2
26
35

answered Nov 07 '12 at 13:15

codaddict

445,704
82
492
529

5

@Yola There isn’t a faster-than-O(n) solution; there can’t be. – Konrad Rudolph Nov 07 '12 at 13:21
This solution not always O(N) – Yola Nov 07 '12 at 13:22
5

Note that there is still an issue with duplicates (how to handle it was not specified in the question), but often you want to print each element `min{#occurances(A),#occurances(B)}` or just once, while this solution prints them `#occurances(B)` times – amit Nov 07 '12 at 13:23
2

@Yola No. You can ensure that it is O(N) for limited domains. For non-limited domains you can still prove that it *almost certainly* is O(N), and you can even quantify that certainty. – Konrad Rudolph Nov 07 '12 at 13:23
@Yola: Assuming descent load balance for the hash table, the probability for getting the worst case is slim to none. – amit Nov 07 '12 at 13:23
@Konrad Rudolph, it depends on hash table - load factors, collisions resolving etc. Iwould not be so sure with this structure – Yola Nov 07 '12 at 13:25
6

@Yola We can wax poetry all day long. But you said “not the fastest way” and this is wrong in practice (in the expected case it *is* the fastest solution) *and* in theory (if push comes to shove you can implement a schema that either makes non-linear performance *exceedingly* unlikely or even guarantees linear performance for a limited domain). Neither of these theoretically-perfect solutions is necessarily easy to ensure in typical use-cases, but then hashing is already the fastest solution for the typical use-case. – Konrad Rudolph Nov 07 '12 at 13:29
@Konrad Rudolph, ok, i think its fastest, +1 – Yola Nov 07 '12 at 13:31
@Konrad Rudolph, but calculate hash may take much time isnt it? And search in hash table takes O(logN). – Yola Nov 07 '12 at 13:58
3

@Yola No, search in a hash table takes O(1). How long calculating the hash takes depends on the key type but it’s usually not much slower than comparing two keys (which is required in a tree and in sorting). In particular, for simple `char` keys it’s very fast. – Konrad Rudolph Nov 07 '12 at 14:00
2

@Konrad Searches take O(n) time. Imagine a case where all elements have the same hash value, so you need to store them in a list. – Jakub Zaverka Nov 07 '12 at 14:03
1

to fix the duplicate prints just remove `ele` from `H` when you print it (and break out when it gets empty) – ratchet freak Nov 07 '12 at 14:07
3

Everything takes O(1) time if you limit the size of the dataset. Hash lookups are O(1) on dataset sizes up to some limit, but so is factoring into primes. Hashes are fast due to a small constant not big O notation. – Yakk - Adam Nevraumont Nov 07 '12 at 14:12
@Jakub Read the discussion we had up to this point. You can prevent such collisions. In the *expected* case (with very high probability), the *amortised* runtime is O(1) for a proper load factor. That is what matters. The worst-case time of O(N) for a single access is highly misleading for a general-purpose analysis. – Konrad Rudolph Nov 07 '12 at 14:14
This does not handle duplicates. If you look at my solution, it's near the same as your, just using a Map instead of Set. – Nicholas Nov 07 '12 at 17:01
@KonradRudolph: Yakk is exactly right, O(1) hash lookups are a lie. To get a O(1) expected lookup time, calculating the hash function must also be O(1) amortised, meaning it must examine at most a constant number c of bits in each key on average. When more than 2^c distinct keys are added to a hashtable, no hash function can distinguish them all. If there are m2^c keys, a best-possible O(1) hash function will map m keys into each of the 2^c buckets, implying linear lookup times with a tiny constant, around 1/2^c. – j_random_hacker Nov 07 '12 at 17:22
1

@j_random_hacker This is very, **very** rarely relevant (but, I agree, it sometimes is). But if you insist on being pedantic here then I will acquiesce and must insist that lookup *is* O(1). Asymptotic performance analysis is performed on a theoretical machine which has elementary operations whose runtime is defined as O(1), including elementary integer operations. By implication, hash table lookup is O(1), regardless of whether you count number of hashing operations, number of comparisons or number of other operations. (continued …) – Konrad Rudolph Nov 07 '12 at 21:00
1

@j_random_hacker (continued) You *can* analyse the runtime as O(k) where k is the maximum length of the keys. This isn’t necessarily O(N) though (k is independent from N in the general case). Furthermore, what we were comparing here was (implicitly!) the performance of hash tables relative to other methods of implementing OP’s algorithm. And if you insist that hash table lookup is O(N) then you run into the same behaviour in comparison-based methods, so your complication of the analysis gains you nothing. O(1) lookup performance is not a lie, it’s a model. Can we now *please* bury this? – Konrad Rudolph Nov 07 '12 at 21:01
1

We are using a Hash which hides an iteration that equals an iteration to array. Hash has to iterate through itself before saying that "I already have this object" or "I donot have this object brother". – Ranjan Sarma Nov 08 '12 at 12:45

Jakub Zaverka · Answer 2 · 2012-11-07T22:51:17.027

33

The lower bound on efficiency is O(n) - you need to at least read all the elements. Then there are several apporaches:

Dumb simplest approach

Search for every element from array one in array two. Time complexity O(n^2).

Sorting approach

You need to sort only array one, then search for elements from array two using binary search. Time complexity: sorting O(nlogn), searching O(n * logn) = O(nlogn), total O(nlogn).

Hash approach

Create a hash table from array one elements. Search for elements form second table in the hash table. The time complexity depends on the hash function. You can achieve O(1) for searches in the optimal case (all elements will have different hash value), but O(n) in the worst case (all elements will have the same hash value). Total time complexity: O(n^x), where x is a factor of hash function efficiency (between 1 and 2).

Some hash functions are guaranteed to build a table with no collisions. But the building no longer takes strictly O(1) time for every element. It will be O(1) in most cases, but if the table is full or a collision is encountered, then the table needs to be rehashed - taking O(n) time. This happens not so often, much less frequently than clean adds. So the AMORTISED time complexity is O(1). We don't care about some of the adds taking O(n) time, as long as the majority of adds takes O(1) time.

But even so, in an extreme case, the table must be rehashed every single insertion, so the strict time complexity would be O(n^2)

edited Nov 07 '12 at 22:51

answered Nov 07 '12 at 14:20

Jakub Zaverka

8,816
3
32
48

1

There is no need in rehashing since the length of the array can be precalculated, and you can create the hash table of size `n * LF^-1`, where `LF` is your pre-determined load factor. The complexity is `O(1)` per op for any `LF < 1`, and not `O(n^LF)`, because the expected number of reads you will need is `E= 1 + 1*LF + 1*LF^2 + ... + 1*LF^n < CONSTANT` (sum of geometric series), so each op is `O(1)`. That said, the worst case of hash tables is still `O(n)` per op, but the average case will be `O(1)` – amit Nov 07 '12 at 14:40
Also, sorting approach can be done in `O(NlogM)`, where N is the length of the longer array and M is the length of the shorter one. – amit Nov 07 '12 at 14:46
1

@amit I agree, but rehash still be needed if there is a collision in the table. – Jakub Zaverka Nov 07 '12 at 14:51
If you assume rehashing solution for collisions (and not chaining or linear open addressing) - then yes. But it will be `O(n)` average case and not `O(n^LF)`. – amit Nov 07 '12 at 14:53
@amit That's what I am saying. The O(n^x) is for the case where there is no rehashing and collisions are solved by chaining or by another method. – Jakub Zaverka Nov 07 '12 at 14:58
Providing the worst case of lookup without an analysis of the *likelihood* is misleading and useless (unless you are explicitly searching for vulnerabilities to malicious input). In fact, the likelihood of this happening by chance (with a well-designed hash table!) is so exceedingly small that it can be safely disregarded. Your answer makes it sound as if expected value of `x` is close to 1.5 (and uniformly distributed between 1 and 2) when in fact it’s very close to 1. The same is true for double hashing. I’ve forgotten how to analyse this but there is a formalism to prove this. – Konrad Rudolph Nov 07 '12 at 15:45
2

@Konrad I don't deny that. I just operate on strict time complexity, the worst case scenarios. – Jakub Zaverka Nov 07 '12 at 15:49
Rehashing concept was ignored in solution mentioned above by codadict. – Shridutt Kothari Sep 12 '14 at 12:25

score 20 · Answer 3 · edited May 23 '18 at 09:22

20

There are a few methods in some languages that I'm aware of that do exactly what you want, have you considered looking at some of these implementations?

PHP - array_intersect()

$array1 = array("a" => "green", "red", "blue");
$array2 = array("b" => "green", "yellow", "red");
$result = array_intersect($array1, $array2);
print_r($result);

>> green
   red

Java - List.retainAll

Collection listOne = new ArrayList(Arrays.asList("milan","dingo", "elpha", "hafil", "meat", "iga", "neeta.peeta"));
Collection listTwo = new ArrayList(Arrays.asList("hafil", "iga", "binga", "mike", "dingo"));

listOne.retainAll( listTwo );
System.out.println( listOne );

>> dingo, hafil, iga

edited May 23 '18 at 09:22

ccpizza

28,968
18
162
169

answered Nov 07 '12 at 13:32

Mike

47,263
29
113
177

1

Python can do this as well with [sets](http://docs.python.org/2/library/stdtypes.html#set.intersection). I imagine many languages also have a set type that can handle this. – thegrinner Nov 07 '12 at 16:18
I can bet `retainAll` of an Arraylist (infact all of the List implementation in std java) does a O(n^2). – st0le Nov 08 '12 at 08:25

score 12 · Accepted Answer · answered Nov 07 '12 at 13:32

12

Since this looks to me like a string algorithm, I'll assume for a moment that its not possible to sort this sequence (hence string) then you can use Longest Common Sequence algorithm (LCS)

Assuming the input size is constant, then the problem has a complexity of O(nxm), (length of the two inputs)

answered Nov 07 '12 at 13:32

Moataz Elmasry

2,509
4
27
37

1

But why do a `O(n*m)` complicated solution when there are `O(n+m)` and `O(nlog(m))` ones? :| – amit Nov 07 '12 at 14:54
@amit the dynamic programming solution takes O(n*m), which one takes O(n+m)? for O(nlog(m)) solution I assume you are talking about sorting, right? which is something I chose to ignore – Moataz Elmasry Nov 07 '12 at 17:06
1

How LCS will give intersection of 2 strings? We might miss out on many characters. for e.g. s1=[A B D C E F] and s2=[C D E F G H] here, LCS will be [D E F] while intersection of 2strings is [C D E F]! Am I missing something here? – srbhkmr Nov 13 '12 at 20:49
srbh.kmr, no, you're not, that's the LCS. This is even a simpler example where the LCS doesn't work: char[] A = {'a', 'b'}; char[] B = {'b', 'a'}; LCS gives either {'a'} or {'b'}, depending on the implementation, because it looks for a sequence of characters that appears in the same order in both arrays. – Gabriel Nov 14 '12 at 00:14
1

srbh.kmr, @Gabriel, I disagree with your example. this is not a shortcoming of LCS, but of your example(s), the two inputs simply have to share the exact common sequence in order to find a match. usually its run from left ti right, but you can do it in the other direction. the algorithm searches for the common sequence and just the common characters, so "one" has at most one letter match with "eno". This is what the algorithm does and not a disadvantage of it – Moataz Elmasry Nov 14 '12 at 00:36
2

LCS doesn't give the right answer for this problem, the algorithm works on string where the order matters, In this case the intersection is always on sets, where order doesn't matter. The simple example of @Gabriel is right on the spot, we would need more assumptions on the input data for this to work. The hash answer is probably the best given the ordering constraint. – jace Nov 14 '12 at 08:10
well as I explained in my answer,the assumption is that the OP is trying to find a pattern/substring in a certain string, and so the order matters, else the answer of @codaddict is the correct one – Moataz Elmasry Nov 14 '12 at 11:13
4

Well, the OP asked for the intersection between the two arrays. Intersection is a concept from set theory, and sets (by definition) not dependent on the order of its elements. I'm not saying that LCS is bad for some reason. I just claim it does not give you the intersection. It's an algorithm that solves a problem, but not the (set) intersection problem. If the arrays aren't to be considered as sets, then ok, but then you are probably not asking for a true intersection. But I understand the point of your assumption: if it's an array, it's more to a string than to a set. – Gabriel Nov 14 '12 at 21:38

Mik378 · Answer 5 · 2012-11-07T14:18:45.390

5

    public static void main(String[] args) {
        char[] a = {'a', 'b', 'c', 'd'};
        char[] b = {'c', 'd', 'e', 'f'};
        System.out.println(intersect(a, b));
    }

    private static Set<Character> intersect(char[] a, char[] b) {
        Set<Character> aSet = new HashSet<Character>();
        Set<Character> intersection = new HashSet<Character>();
        for (char c : a) {
            aSet.add(c);
        }
        for (char c : b) {
            if (aSet.contains(c)) {
                intersection.add(c);
            }
        }
        return intersection;
    }

edited Nov 07 '12 at 14:18

answered Nov 07 '12 at 13:42

Mik378

21,881
15
82
180

Performance will be slightly less than optimal, since you don't actualy need to create the second set just to check if it contains elements from the second one. – vgru Nov 07 '12 at 13:48
I would also check the size, if b is huge and a is small, your code will run slower checking over the larger set – exussum Nov 07 '12 at 16:32
@user1281385 But in this case, dealing with characters, `contains()` is always of O(1) complexity whatever the size of the set. – Mik378 Nov 07 '12 at 17:14

score 4 · Answer 6 · answered Nov 07 '12 at 23:22

int s[256] // for considering all ascii values, serves as a hash function

for(int i=0;i<256;i++)
s[i]=0;

char a[]={'a','b','c','d'};
char b[]={'c','d','e','f'};

for(int i=0;i<sizeof(a);i++)
{
   s[a[i]]++;
 }

 for(int i=0;i<sizeof(b);i++)//checker function
 {
     if(s[b[i]]>0)
       cout<<b[i]; 
  }


  complexity O(m+n);
  m- length of array a
  n- length of array b

score 3 · Answer 7 · edited Sep 28 '16 at 06:14

Google Guava

There are already many good answers to this, but if you want the one-liner approach using a library for lazy-coding, I'd go with Google Guava (for Java) and its Sets.intersection method.

^{(no compiler at hand, bear with me)}

char[] A = {'a', 'b', 'c', 'd'};
char[] B = {'c', 'd', 'e', 'f'};

Set<Character> intersection = Sets.intersection(
    Sets.newHashSet<Character>(Chars.asList(a)),
    Sets.newHashSet<Character>(Chars.asList(b))
);

Obviously, this is assuming both arrays wouldn't have duplicates, in which case using a set data structure would make more sense and allow for this sort of operation more efficiently, especially if you don't start from an array of primitives from the start.

May or may not fit your use case, but sort of the no-brainer approach for the general case.

score 2 · Answer 8 · answered Nov 07 '12 at 13:15

2

Sort both the arrays.
Then do loop until they have have elements common Or one of the arrays reaches its end.

Asymptotically, this takes the complexity of sorting. i.e. O(NlogN) where N is the length of longer input array.

answered Nov 07 '12 at 13:15

P.P

117,907
20
175
238

5

It can be improved to `O(NlogM)` (N is the length of the larger arra, M is the shorter) by sorting the shorter array alone, and iterating+binary searching each element in the longer array. – amit Nov 07 '12 at 14:45

score 2 · Answer 9 · answered Nov 07 '12 at 16:39

If you care about duplicates, use a hash map to index list A, with the key being the element, and the value being a number of how many times that element has been seen.

You iterate through the first and for every element in A and if it does not exist in the map, put it in there with a value of 1, if it already exists in the map, add one to that value.

Next, iterate through B, and if the value exists, subtract 1. If not, put -1 in the value on the table for that element.

Finally, iterate through the map and for any element that has a value != 0, print out as a difference.

private static <T> List<T> intersectArrays(List<T> a, List<T> b) {
    Map<T, Long> intersectionCountMap = new HashMap<T, Long>((((Math.max(a.size(), b.size()))*4)/3)+1);
    List<T> returnList = new LinkedList<T>();
    for(T element : a) {
        Long count = intersectionCountMap.get(element);
        if (count != null) {
            intersectionCountMap.put(element, count+1);
        } else {
            intersectionCountMap.put(element, 1L);
        }
    }
    for (T element : b) {
        Long count = intersectionCountMap.get(element);
        if (count != null) {
            intersectionCountMap.put(element, count-1);
        } else {
            intersectionCountMap.put(element, -1L);
        }            
    }
    for(T key : intersectionCountMap.keySet()) {
        Long count = intersectionCountMap.get(key);
        if (count != null && count != 0) {
            for(long i = 0; i < count; i++) {
                returnList.add(key);
            }
        }
    }
    return returnList;
}

This should run in O(n), as we're only iterating the Lists each once, and the Map once. The Data structures here used in Java should be efficient, as the HashMap is constructed with a capacity that can handle the largest size of the lists.

I'm using a LinkedList for the return as it provides us a way of adding and iterating through a list for our unknown sized intersection.

score 1 · Answer 10 · answered Nov 07 '12 at 13:18

1

The best way is not to start with arrays at all. Arrays are optimal for random access to elements, but not optimal for searching (which is what finding the intersection is all about). As you are talking about intersection, you must be regarding the arrays as sets. So use a more appropriate data structure (in Java, a Set). Then the task is much more efficient.

answered Nov 07 '12 at 13:18

Raedwald

46,613
43
151
237

1

Depends. For a small number of data points (~ less than 15) arrays *are* the fastest data structure for searching. – Konrad Rudolph Nov 07 '12 at 13:23
… incidentally, `Set` in Java is an interface and can be implemented by a linear array. – Konrad Rudolph Nov 07 '12 at 13:31

score 1 · Answer 11 · answered Nov 07 '12 at 13:27

1

You can use tree, but time will be O(n(log n)) and elements must be comparable

answered Nov 07 '12 at 13:27

Yola

18,496
11
65
106

3

trees are very inefficient for constant data relative to sorting solutions. The main reason for it is cache - arrays are MUCH more cache efficient then trees. (Though it is `O(nlogn)`, same as sort and iterate, the hidden constants will be much higher) – amit Nov 07 '12 at 13:28

score 1 · Answer 12 · edited Nov 14 '12 at 12:51

1

First, sort the two arrays using best sorting algorithm.
Then, with linear search, you can get the common elements.

If an extra space is provided then we can use hash table to do that.

edited Nov 14 '12 at 12:51

Mxyk

10,678
16
57
76

answered Nov 14 '12 at 12:28

kishore

1,658
7
24
33

score 1 · Answer 13 · answered Mar 12 '13 at 01:25

1

in ruby you can just say

a = ['a', 'b', 'c', 'd']
b = ['c', 'd', 'e', 'f']
c = a & b

c contains ['c','d']

answered Mar 12 '13 at 01:25

Colin MacKenzie - III

1,373
12
13

score 1 · Answer 14 · answered Oct 01 '14 at 14:29

Sort two arrays first, then iterate them, if they are the same element, add to to be returned array.

Code is here:

public static void printArr(int[] arr){
    for (int a:arr){
        System.out.print(a + ", ");
    }
    System.out.println();
}

public static int[] intersectionOf(int[] arr1, int[] arr2){
    Arrays.sort(arr1);
    Arrays.sort(arr2);

    printArr(arr1);
    printArr(arr2);

    int i=0, j=0, k=0;
    int[] arr = new int[Math.min(arr1.length, arr2.length)];

    while( i < arr1.length && j < arr2.length){
        if(arr1[i] < arr2[j]){
            i++;
        } else if(arr1[i] > arr2[j]){
            j++;
        } else {
            arr[k++] = arr1[i++];
            j++;
        }
    }
    return Arrays.copyOf(arr, k);
}

public static void main(String[] args) {
    int[] arr1 = {1, 2, 6};
    int[] arr2 = {10, 2, 5, 1};
    printArr(intersectionOf(arr1,arr2));
}

outputs:

arr1: 1, 2, 6, 
arr2: 1, 2, 5, 10, 
arr: 1, 2,

score 0 · Answer 15 · answered Nov 07 '12 at 18:33

0

Assuming you are dealing with ANSI characters. The approach should be similar for Unicode, just change the range.

char[] A = {'a', 'b', 'c', 'd'};
char[] B = {'c', 'd', 'e', 'f'};
int[] charset = new int[256]

for(int i=0; i<A.length; i++) {
  charset[A[i]]++;
}

Now iterate over the B and you can check if the corresponding charset value for the character being iterated is greater than 0. You can store them in a list or any other collection.

This approach takes O(n) time complexity and a constant space for your checks not taking into account your new array/list being used to hold the common elements.

This is better than the HashSet/Hashtable approach in terms of space complexity.

answered Nov 07 '12 at 18:33

Vamshidhar Behara

521
4
12

2

The range of Unicode is (theoretically) 32 bits – that would require 16 GiB of RAM for your `charset` table. Do you see how why people used hash tables instead? – Konrad Rudolph Nov 07 '12 at 21:07
@KonradRudolph I think it really depends on the problem and what trade-offs one is willing to make. If we are dealing with comparing terabytes of file then a fixed size charset table would be a more feasible approach if you want to stick to one machine. – Vamshidhar Behara Nov 21 '12 at 16:19
@KonradRudolph The above case also beats the hash table approach any time if we know we are dealing with ASCII strings. – Vamshidhar Behara Nov 21 '12 at 16:24
Keep dreaming. The Unicode array would be 4 GiB large. Even if we have sufficient RAM, the random access pattern would kill cache performance. A hash table will be faster. – Konrad Rudolph Nov 21 '12 at 19:37
For large data sets wouldn't insertions into hash table be O(n)? How would it be faster? I think a map-reduce approach may be a better approach for solving this problem when dealing with large data sets. – Vamshidhar Behara Nov 21 '12 at 20:20
The problem with using an array is that it also no longer has O(1) access time – loading the relevant memory pages and caching on the processor will take a significant time. Hash table, because they take much less space, will have significantly better individual access times. – Konrad Rudolph Nov 21 '12 at 22:06
What you are suggesting is that the run time is no longer a function of the size of the input(n). However, in case of the hash table you will also need to store the hash for each character/string in the array. In the above case you would not need that. Arrays are stored sequentially and hence their access time would be constant. In case of hash table you would run into collisions very easily and depending on implementation would take O(n) for access in the worst case. – Vamshidhar Behara Nov 22 '12 at 15:10
You are simply wrong, and I’ve already explained the reason. As the array becomes very large, it no longer fits into quickly accessible memory regions. As a consequence, parts of it need to be loaded from slower storage, which takes time (up to 1000 times as much time as a regular access!). The hash table does *not* run into this problem because it’s much (much, much!) smaller. Yes, you may get collisions but the expected number of collisions is still very low. because even very long text will only use few different characters. So finding an entry is, on average, still very fast. – Konrad Rudolph Nov 23 '12 at 09:54

score 0 · Answer 16 · edited Aug 08 '13 at 08:24

0

You could use HashSet in .NET 3.5 or later. Example c# code:

HashSet<int> set1 = new HashSet<int>(new int[]{8, 12, 13, 15});

HashSet<int> set2 = new HashSet<int>(new int[] { 15, 16, 7, 8, 9 });

set1.IntersectWith(set2);

foreach (int i in set1)

   Console.Write(i+ " ");

//output: 8 15

edited Aug 08 '13 at 08:24

Eduard Luca

6,514
16
85
137

answered Aug 08 '13 at 08:00

Sanj

261
2
6

score 0 · Answer 17 · answered Oct 15 '13 at 13:55

0

Sort one of the arrays (m Log(m) ) now Pick each element from other array and do a binary search in first array(the sorted one) ->n Log(m)

Total Time Complexity :- (n+m)Log(m).

answered Oct 15 '13 at 13:55

navgupta

1

score 0 · Answer 18 · answered Dec 14 '14 at 05:30

I hope the following would be useful. These are two different approached:

Simple Intersection where you compare all the elements from one array to another array.
Sorting and searching based approach which sorts one array and search second array element in first array using binary search.

//

public class IntersectionOfUnsortedArrays {
    public static void main(String[] args) {
        int[] arr1 = { 12, 4, 17 };
        int[] arr2 = { 1, 12, 7, 17 };
        System.out.println("Intersection Using Simple Comparision");
        printArray(simpleIntersection(arr1, arr2));
        System.out.println("Intersection Using Sort and Binary Search");
        printArray(sortingBasedIntersection(arr1, arr2));
    }

    /*
     * Simple intersection based on the comparison without any sorting.
     * Complexity O(n^2)
     */
    public static int[] simpleIntersection(int[] a, int[] b) {
        int minlen = a.length > b.length ? b.length : a.length;
        int c[] = new int[minlen];
        int k=0;
        for(int i=0;i<a.length;i++){
            for(int j=0;j<b.length;j++){
                if(a[i]==b[j]){
                    c[k++]=a[i];
                }
            }
        }
        int arr[] = new int[k];
        // copy the final array to remove unwanted 0's from the array c
        System.arraycopy(c, 0, arr, 0, k);
        return arr;
    }

    /*
     * Sorting and Searching based intersection.
     * Complexity Sorting O(n^2) + Searching O(log n)
     */

    public static int[] sortingBasedIntersection(int[] a, int[] b){
        insertionSort(a);
        int minlen = a.length > b.length ? b.length : a.length;
        int c[] = new int[minlen];
        int k=0;
        for(int i=0;i<b.length;i++){
            int result = binarySearch(a,0,a.length,b[i]);
            if(result > -1){
                c[k++] = a[result];
            }
        }
        int arr[] = new int[k];
        // copy the final array to remove unwanted 0's from the array c
        System.arraycopy(c, 0, arr, 0, k);
        return arr;
    }

    public static void insertionSort(int array[]) {
        for (int i = 1; i < array.length; i++) {
            int j = i;
            int b = array[i];
            while ((j > 0) && (array[j - 1] > b)) {
                array[j] = array[j - 1];
                j--;
            }
            array[j] = b;
        }
    }

    static int binarySearch(int arr[], int low, int high, int num) {
        if (high < low)
            return -1;
        int mid = (low + high) / 2;
        if (num == arr[mid])
            return mid;
        if (num > arr[mid])
            return binarySearch(arr, (mid + 1), high, num);
        else
            return binarySearch(arr, low, (mid - 1), num);
    }

    public static void printArray(int[] array) {
        for (int value : array) {
            System.out.print(" "+value);
        }
        System.out.println("\n");
    }
}

score 0 · Answer 19 · answered Nov 03 '15 at 02:25

If the collections are already sorted, as shown in the question, then the best solution (not mentioned yet) is a merge-sort-like algorithm that runs in O(n+m).

Compare the first elements of each collection. If they're the same, add the element to the intersection set and pop both elements from their collections. If the elements are different, pop the element that's greater, in comparison, to the other element. Repeat until one collection is empty.

score 0 · Answer 20 · answered Dec 16 '15 at 03:07

Using Java 8 features, here is an algorithm which honours duplicates within a list instead of turning a list into a set. No sorting, so no n log n.

Convert one of the lists into a map, with value being number of occurrences (cost: O(n)).
For each item in the other list, if the item exists in the map, decrease occurrence by one (cost: O(n)).

Therefore, overall cost is O(n). Code:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class Dup {
  public static void main(String[] args) {
    List<Integer> listA = Arrays.asList(3, 1, 4, 1, 9, 5, 9);
    List<Integer> listB = Arrays.asList(2, 6, 5, 3, 5, 8, 9, 7, 9, 3, 2, 3);
    findCommons(listA, listB);
  }

  static void findCommons(List<Integer> listA, List<Integer> listB) {
    Map<Integer, Long> mapA = 
        listA.stream().collect(
            Collectors.groupingBy(Integer::intValue, Collectors.counting()));

    List<Integer> commons = new ArrayList<>();
    listB.stream()
        .filter(e -> mapA.get(e) != null)
        .filter(e -> mapA.get(e) > 0)
        .forEach(e -> {
            mapA.put(e, mapA.get(e) - 1);
            commons.add(e);
        });

    System.out.println(commons);
  }
}

Code above will give this output: [5, 3, 9, 9].

score 0 · Answer 21 · answered Sep 28 '16 at 03:36

import java.util.Scanner;

public class arraycommon {

public static void main(String[] args) {
    Scanner sc=new Scanner(System.in);
    // display common element in two diffrent array
    int sizea,sizeb,i=0,j=0,k=0;
    int count=0;
    System.out.println("enter the size array A:"+'\n');
    sizea=sc.nextInt();
    System.out.println("enter the size array B"+'\n');
    sizeb=sc.nextInt();
    int a[]=new int[sizea];
    int b[]=new int[sizeb];
    int c[]=new int[sizea];


    System.out.println("enter the element in array A:"+'\n');
    for (i = 0; i < sizea; i++) {

        a[i]=sc.nextInt();
    }
    System.out.println("enter the element in array B:"+'\n');
    for (i = 0; i < sizeb; i++) {

        b[i]=sc.nextInt();
    }
    System.out.println("the element in array A:"+'\n');
    for (i = 0; i < sizea; i++) {

        System.out.print(a[i]+" ");

    }
    System.out.println('\n');
    System.out.println("the element in array B:"+'\n');
    for (i = 0; i < sizeb; i++) 
    {

        System.out.print(b[i]+" ");
    }

    for (i = 0; i <sizea; i++) 
    {
        for (j = 0; j < sizeb; j++) 
        {
           if(a[i]==b[j])
           {
               count++;
               c[k]=a[i];
               k=k+1;
           }
        }
    }
    System.out.println('\n');
    System.out.println("element common in array is");

    if(count==0)
    {
        System.out.println("sorry no common elements");
    }
    else
    {
        for (i = 0; i <count; i++) 
        {

        System.out.print(c[i]+" ");
        }
    }

}

}

Some explanation of the code would make this answer more helpful. — Tot Zam, Sep 28 '16 at 04:02

score 0 · Answer 22 · answered Mar 17 '17 at 12:22

    simply search each element of first array with each element of second array and stored matched result in third array
class Union
{
  public static void main(String[] args) {
  char a[] ={'f','g','d','v','a'};
  char b[] ={'a','b','c','d','e'};
  char temp[] = new char[5];
  int p=0;
  for(int i=0;i<a.length;i++)
  {
    for(int j=0;j<b.length;j++)
    {
      if(a[i]==b[j])     //searches if both array has common element
      {

        temp[p] = a[i];   //if match found store it in a new array
        p++;
      }

    }

  }
  for(int k=0;k<temp.length;k++)
  {
      System.out.println(temp[k]);
  }

  }
}

This is brute force method, not very efficient. – Harsh Wardhan Mar 17 '17 at 12:45 — Harsh Wardhan, Mar 17 '17 at 12:45

Partha.B · Answer 23 · 2023-05-10T22:06:08.953

Below is my solution with test data

public class IntersectionOf2Arrays {
    public static void main(String[] args) {
        System.out.print("2 Given Arrays are \n");
        int[] x= {2,5,3,7};
        //int[] x= {3, 10, 4, 2, 8};
        int[] y={5,2,9,0,1,3};
        //int[] y={10, 4, 12, 3, 23, 1, 8};
        Arrays.stream(x).forEach(a -> System.out.print("  "+a));
        System.out.print("\n");
        Arrays.stream(y).forEach(b -> System.out.print("  "+b));
        System.out.print("\nIntersection of above two  array is ");
        int[] result = intersection(x,y);
        Arrays.stream(result).forEach(c -> System.out.print("  "+c));
    }
    static int[] intersectionWithFilter(int x[],int y[]){

        int[] z =Arrays.stream(x)
                .distinct()
                .filter(a -> Arrays.stream(y).anyMatch(b -> b==a))
                .toArray();
        return z;
    }

    static int[] intersection(int x[],int y[]) {
        int len = 0;
        if(x.length>y.length)
            len = x.length;
        else
            len=y.length;

        int []z=new int[len];
        int c = 0;
        for(int i=0;i <(x.length);i++) {
            for(int j=0;j < y.length;j++) {
                if(x[i]==y[j]){
                    z[c]=x[i];
                    c++;
                }
                else {
                    continue;
                }
            }
        }
        // As it is int array so it is by default 0 filled , so we need to remove those zeros
        return resize(z,c);
    }

    private static int[] resize(int[] oldArray, int newSize) {
        int[] newArray = new int[newSize];
        System.arraycopy( oldArray, 0, newArray, 0, newSize );
        return newArray;
    }
}

Test result is below:- 2 Given Arrays are 2 5 3 7 Second array 5 2 9 0 1 3 Intersection of above two array is 2 5 3

How do I get the intersection between two arrays as a new array?

23 Answers23

Dumb simplest approach

Sorting approach

Hash approach

Google Guava

Linked

Related