Testing for array membership efficiently

Question

In an Java developer interview this was asked from me:

Given two arrays of integers, print out the values from the first array which are not present in the second array. The total time complexity should be O(n).

The approach with which I have come up is:

Use a hash table to store the values of the second array.
Now check if values of first array exists in the hash table. If not, print the values.

Is there any other more efficient approach to implement this?

Are the arrays sorted? Without the arrays being sorted I'm not sure this is possible in O(n). Sorting has a minimum theoretical complexity of n log n which already fails the requirements. — Dan Simmons, May 10 '15 at 04:02

Nayuki · Answer 1 · 2015-05-10T04:20:25.577

We will assume that both arrays are roughly the same length, so that n is used to describe the length of both of them. (Otherwise we need to introduce a new variable.)

Overview of alternatives:

Linear search: If we store the second array as an unsorted array/list, then each search takes O(n) time and the total time is O(n²).

Binary search: If we sort the second array this takes O(n log n) time. Each query takes O(log n) time, for a total of O(n log n) time.

Balanced tree: If we store the second array in a TreeSet, each search takes O(log n) time for a total of O(n log n) time.

Hash table: If we store the second array in a HashSet like you mentioned, each insertion takes O(1) amortized time and each search takes O(1) time, for a total of O(n) time as wanted.

Bit vector: If we create a 2³²-element bit vector to cover all the possible Java int or Integer values, we can add each element of the second array and test for existence of a value in O(1) time, for a total of O(n) time as wanted. However there is a very expensive constant term for the initialization of 2³² elements.

Lower bound argument: The second array has length O(n). It is impossible to accurately tell if a value is in the second array or not if we haven't scanned the entire array. Thus the absolute minimum time to solve this problem is Ω(n). This means that O(n) is the best we can possibly do.

Testing for array membership efficiently

1 Answers1