Efficient method to find what elements of Array A are in Array B

Question

In my particle case I need a JavaScript code that find what strings of Array A are in Array B.

But it's more interesting to ask the general question:

Given Array A with length of N and array B, return a Boolean[] array of length N such that Boolean[i] will be true iff A[i] item is in B.

What is the most efficient solution to that problem? Both of the array are unsorted and array B can be empty and it can be larger or smaller then array A.

Have you read this? https://stackoverflow.com/questions/497338/efficient-list-intersection-algorithm?answertab=votes#tab-top — Dolev, Jul 25 '17 at 15:43

score 1 · Answer 1 · answered Jul 25 '17 at 15:30

1

Convert array B to hash set (or other collection fast for searching) and then do loop on each element of array A and check if this element exist in set.

answered Jul 25 '17 at 15:30

The Tosters

417
3
15

score 1 · Answer 2 · answered Jul 25 '17 at 16:06

1

The Tosters's answer is perfect for random inputs. However since you asked generic algorithm, I thought of following approach.

You need to search one element in other, so sorting one of them is required for optimal solution. Lets consider both cases. For simplicity lets denote their sizes with same letters. I mentioned time complexity of step in brackets after description of step.

Sorting B:

1. Sort B O(B log B).
2. Iterate over each element of A and check if element exists in B, O(A log B)
Time complexity: O(A log B) + O(B log B) = O((A+B) log B)

Sorting A:

1. Sort array A also maintaining original position in sorted array, O(A log A). 
2. Iterate on each element of B and find all A's containing B. Now iterate over each searched element to get original index and update Boolean array. Also keep a marker in sorted A that the element has been searched to avoid searching if same number appears again. O(B log A)
Time complexity: O(A log A) + O(B log A) = O((A+B) log A)

As a result, the solutions differ by factor of log of size of arrays. So for best solution, sort the array with lesser size and use corresponding algorithm.

answered Jul 25 '17 at 16:06

Shashwat Kumar

5,159
2
30
66

Tosters's answer is still better as it will be: O(A+B), searching hash is O(1) in compare to binary search of log(N) – Ilya Gazman Jul 25 '17 at 16:13
1

@Ilya_Gazman See my other comment; depending on the implementation of hash set it might be equivalent in the worst-case but it isn't better than `O(n log n)`. I like this answer because it makes explicit that you can do as good as possible by sorting the shorter one - implied but not stated in my almost identical answer. – Patrick87 Jul 25 '17 at 16:18
@Ilya_Gazman as Patrick said and I also, that in worst case Tosters answer will be O(n^2). Its optimal only for random inputs. – Shashwat Kumar Jul 25 '17 at 16:34
Hash set is only a proposition, you can use tree which will produce sorted collection, still fast for searching. – The Tosters Jul 27 '17 at 07:29
And then a tree is not O(1). – Shashwat Kumar Jul 27 '17 at 18:45

score 1 · Accepted Answer · answered Jul 25 '17 at 16:14

There are a few different ways to do this with varying complexities.

First, you could do a nested loop over both lists and print out elements from A when you find a corresponding solution in B. This is O(ab) time and O(1) space.

Second, you could sort both lists and then walk through the lists in lockstep looking for matches (imagine the merge operation in mergesort). This is O(aloga + blogb) time and either O(1) or O(a+b) space, depending on whether you want to sort in-place or not.

Third, you could sort just the second list and then binary search for elements in A one at a time. This is O(blogb + alogb) time and O(1) or O(b) space, again depending on whether you sort in-place or not.

The Tosters proposed a solution whereby you create a hash set and do repeated queries on it. Depending on how the hash set resolves collisions, this is asymptotically equivalent in the worst case to either the first or third solutions (the underlying buckets can be lists or binary search trees).

the underlying buckets using hash map implementation making the search to O(1), so it's the optimal solution — Ilya Gazman, Jul 25 '17 at 16:20
@Ilya_Gazman What if the hashes in the interior hash tables collide? — Patrick87, Jul 25 '17 at 16:21

score 0 · Answer 4 · answered Jul 25 '17 at 15:38

0

It's extremely easy to do so in JS, I wouldn't even call it an algorithm :)

A.map(x => B.includes(x));

answered Jul 25 '17 at 15:38

Romain Artru

65
9

[The Tosters](https://stackoverflow.com/a/45307701/1129332) solved it in O(n) your solution is O(N^2) – Ilya Gazman Jul 25 '17 at 15:39
1

@Ilya_Gazman Actually, the worst-case asymptotic bound of The Toster's solution is either O(n^2) or O(n log n), depending on how the underlying hash set manages collisions. – Patrick87 Jul 25 '17 at 16:04

Efficient method to find what elements of Array A are in Array B

4 Answers4