Searching two arrays for matches, no extra memory

Question

I had an interview the other day with Amazon, and a question they asked me was pertaining to the following problem.

Given 2 integer arrays, containing any number of elements both positive and negative, find numbers that appear in both arrays.

I was able to solve this problem very easily with HashMaps so it would have O(n) computational complexity, but unfortunately this will also have O(n) space complexity. This could be done with no extra memory by iterating through all elements in each array, but this would be O(n^2).

The interviewer, after I finished explaining the HashMap method, asked if I could think of a method that would be O(n) computationally, but would not use any extra memory. I could not think of any on the fly, and have not been able to find a solution for this. Is there a way of finding these values without using extra memory, in linear time?

Note: I have posted this question on CareerCup, but everyone on there does not seem to get the concept that I need it to not use extra space, and that it has to be O(n) computationally.

Here is the code I used during the interview. It works, but just is not O(1) for space.

import java.util.*;
public class ArrayFun {
    public static void main(String[] args) {

        int[] a = {1,2,3,4};
        int[] b = {2,5,6,7,3,2,2,2,2,1,2,2,2,2};
        ArrayList<Integer> matches = ArrayFun.findMatches(a,b);
        for (int i = 0;i<matches.size();++i) {
            System.out.println(matches.get(i));
        }
    }

    public static ArrayList<Integer> findMatches(int[] a, int[] b) {
        HashMap<Integer,Integer> map = new HashMap<Integer,Integer>();
        ArrayList<Integer> matches = new ArrayList<Integer>();
        for (int i = 0;i<a.length;++i) {
            map.put(a[i],0);
        }
        for (int i = 0;i<b.length;++i) {
            if (map.get(b[i]) != null && map.get(b[i]) == 0) {
                map.put(b[i],1);
                matches.add(b[i]);
            }
        }
        return matches;
    }
}

This code will return

1,2,3

EDIT: also when I say no additional space, and O(1), I am kind of using them interchangeably. By no additional space I mean small placeholder variables are fine but allocating new arrays is not.

You can do it in `O(n log n)` with no extra space by sorting both arrays in place... that's probably not what they want, though — durron597, Nov 08 '12 at 21:31
This is possible only when the two arrays are sorted. It is not possible otherwise. — Sergey Kalinichenko, Nov 08 '12 at 21:32
@durron597 There is no sorting method this efficient that won't use extra space (either heap or stack). Also, `O(n log n)` is the **typical case**, but the worst case is usually `O(n2)`. — Marko Topolnik, Nov 08 '12 at 21:40
@MarkoTopolnik: Randomized quicksort is pretty good. Obviously all quicksort algorithms have a worst case of `O(n^2)` — durron597, Nov 08 '12 at 21:43
@durron597 Quicksort fails the requirement of "no additional space". — Marko Topolnik, Nov 08 '12 at 21:45
Whoops, I thought that quicksort in place was `O(1)` space not `O(log n)` space. My bad — durron597, Nov 08 '12 at 21:48

Mel Nicholson · Accepted Answer · 2012-11-20T19:37:16.833

7

There is no O(1) space method for finding the intersection of two unsorted sets in O(n) time.

For a data type with an unlimited range, the minimum sorting price is O(n ln n).

For a data type with a limited range radix sort provides the ability to do an in-place radix sort in O(n ln n' n") time, where n is the size of the data, n' is the number of values that can be represented, and n" has to do with the cost of checking whether two values are in the same radix group. The n" time price can be dropped in return for an O(ln n) space price.

In the special case of 32-bit integers, n' is 2^32 and n" is 1, so this would collapse to O(n) and provide a winning solution for multi-billion record sets.

For integers of unlimited size, n' and n" preclude an O(n) time solution via radix.

edited Nov 20 '12 at 19:37

answered Nov 08 '12 at 21:33

Mel Nicholson

3,225
14
24

that is what my conclusion was. But why would they ask me to try it if there wasn't? – MZimmerman6 Nov 08 '12 at 21:35
2

Maybe they wanted to see if you would try to B.S. when you didn't know the answer, or whether you could give a definitive "impossible" answer. – Mel Nicholson Nov 08 '12 at 21:38
1

I would agree if any sorting has to be done by comparison sort. However, if the arrays are Java int we are not limited to comparison sort. – Patricia Shanahan Nov 09 '12 at 02:17
@MZimmerman6 The question may have been to see if you know that the \Omega(n log n) minimum time for sorting only applies to comparison sorts. If you have a limited range structured type such as Java int, there are other sorts that are O(n). In particular, see the in-place radix sort in my answer. – Patricia Shanahan Nov 09 '12 at 16:12
Sorting is not really going to help me out though, I will still need to index through everything. Would I not? – MZimmerman6 Nov 09 '12 at 22:06
-1: This answer is misleading. Yes, there are no such *comparision* sorts, but we are not limited to those. In fact, the finite key domain admits sorting in linear time (and O(1) space), as Patricia's answer shows. – meriton Nov 11 '12 at 23:56
@MZimmerman6: Depends on what you mean by "index through everything". Yes, finding the duplicates will require iterating through each array once, which is O(n). – meriton Nov 12 '12 at 00:05

score 2 · Answer 2 · edited May 23 '17 at 11:51

The key is to sort the two arrays in-place. I did a search for "in-place radix sort", and found In-Place Radix Sort. I believe the problem is solvable, at least for Java int[], by applying those ideas to sort each array, bit by bit, then doing the obvious scan.

Incidentally, I think the correct output for the problem in the question code is 1, 2, 3.

Here is my implementation, based on the answers to the referenced question:

    public class ArrayMatch {
      public static void main(String[] args) {
        int[] a = { 4, 1, 2, 3, 4 };
        int[] b = { 2, 5, 6, 7, 3, 2, 2, 2, 2, 1, 2, 2, 2, 2 };
        System.out.print("Original problem");
        printMatches(a, b);
        System.out.println();

        int[] a1 = { 4, 1, -1234, 2, 3, 4, Integer.MIN_VALUE };
        int[] b1 = { -1234, 2, 5, 6, 7, 3, 2, 2, 2, 2, 1, 2, 2, 2, 2 , Integer.MIN_VALUE, Integer.MAX_VALUE};
        System.out.print("With negatives");
        printMatches(a1, b1);
        System.out.println();

      }

      // Print all matching elements between the two arrays.
      private static void printMatches(int[] a, int[] b) {
        if (a.length == 0 || b.length == 0) {
          return;
        }

        sort(a);
        sort(b);

        int i = 0;
        int j = 0;
        while (true) {
          while (a[i] < b[j]) {
            i++;
            if (i == a.length) {
              return;
            }
          }
          while (a[i] > b[j]) {
            j++;
            if (j == b.length) {
              return;
            }
          }

          if (a[i] == b[j]) {
            System.out.print(" " + a[i]);

            do {
              i++;
            } while (i < a.length && a[i - 1] == a[i]);

            do {
              j++;
            } while (j < b.length && b[j - 1] == b[j]);
          }

          if (i == a.length || j == b.length) {
            return;
          }
        }
      }

      // In place radix sort.
      private static void sort(int[] in) {
        // Flip the sign bit to regularize the sort order
        flipBit(in, 31);
        sort(in, 0, in.length, 31);
        // Flip back the sign bit back to restore 2's complement
        flipBit(in, 31);
      }

      /**
       * Sort a subarray, elements start through end-1 of in, according to the
       * values in firstBit through 0.
       * 
       * @param in
       * @param start
       * @param end
       * @param firstBit
       */
      private static void sort(int[] in, int start, int end, int firstBit) {
        if (start == end) {
          return;
        }
        int mask = 1 << firstBit;
        int zeroCount = 0;
        for (int i = start; i < end; i++) {
          if ((in[i] & mask) == 0) {
            zeroCount++;
          }
        }

        int elements = end - start;
        int nextZeroIndex = start;
        int nextOneIndex = start + zeroCount;

        int split = nextOneIndex;

        if (zeroCount > 0 && zeroCount < elements) {
          while (nextZeroIndex < split) {
            if ((in[nextZeroIndex] & mask) != 0) {
              // Found a one bit in the zero area, look for its partner in the one
              // area
              while ((in[nextOneIndex] & mask) != 0) {
                nextOneIndex++;
              }
              int temp = in[nextZeroIndex];
              in[nextZeroIndex] = in[nextOneIndex];
              in[nextOneIndex] = temp;
              nextOneIndex++;
            }
            nextZeroIndex++;
          }

        }

        if (firstBit > 0) {
          sort(in, start, split, firstBit - 1);
          sort(in, split, end, firstBit - 1);
        }

      }

      private static void flipBit(int[] in, int bitNo) {
        int mask = 1 << bitNo;
        for (int i = 0; i < in.length; i++) {
          in[i] ^= mask;
        }
      }
    }

score 1 · Answer 3 · answered Nov 08 '12 at 21:39

One possible answer is similar to the HashMap solution... IF you know that the integers are within a very small window. It would be similar to this: http://en.wikipedia.org/wiki/Bucket_sort

Basically, if the integers are guaranteed to be within a certain constant size window (i.e. all of them are 1-1000) then you can do it in constant space by incrementing each cell of index = whatever your number is. This is exactly the same as the HashMap solution, except you don't need to be able to account for all possible integers like a HashMap can, which lets you save on space. If this is unclear let me know in the comments and I'll explain further.

score 0 · Answer 4 · edited May 23 '17 at 11:59

I believe this is possible to do in-place with O(1) extra space. I make use of the additional assumption that the elements in the arrays are mutable as well as swappable, but I believe with careful accounting the mutability assumption could be removed for this particular problem.

The basic idea is to do in-place hashing. In-place hashing may be implemented by partitioning the array around a suitable percentile, say the 90th, using the O(n) median-of-medians selection algorithm. This divides the array into a small portion (about 10%) and a large portion (about 90%) whose elements are distinguishable from each other (less than the partition element or not). You can then hash from the 10% portion into the 90% portion by swapping. This hashing can be used to detect duplicates. This is O(n) for each processing of 10% of the array, so done 10 times is still O(n). I described this in much more detail, though with some hand-waving I mean to correct one day, over at this related question..

For this particular problem, you need to do in-place hashing 3 times. First on each individual array to remove duplicates. Then, on a wrapper representing the combined arrays (if index is less than length of array 1, index into array 1, else index into array 2) to report duplicates.

Searching two arrays for matches, no extra memory

4 Answers4

Linked