5

Possible Duplicate:
The intersection of two sorted arrays

We have two sorted arrays A and B, besides compare one with all the elements in other array, how to design a best algorithm to find the array with their common elements?

codeforester
  • 39,467
  • 16
  • 112
  • 140
user1686630
  • 109
  • 1
  • 2
  • 7

3 Answers3

32

Hold two pointers: one for each array.

i <- 0, j <- 0
repeat while i < length(arr1) and j < length(arr2):
    if arr1[i] > arr2[j]: increase j
    else if arr1[i] < arr2[j]: increase i
    else : output arr[i], increase both pointers

The idea is, if the data is sorted, if the element is "too big" in one array, it will be "too big" for all other elements left in the array - since it is sorted.

This solution requires a single traversal on the data. O(n) (with good constants as well).

amit
  • 175,853
  • 27
  • 231
  • 333
  • 2
    +1 - For giving a pseudo-code solution that can be translated to real code by the OP. (You should probably also describe what happens in the edge / end cases.) – Stephen C Oct 20 '12 at 23:50
  • This is similar to a merge sort, of course. – Neil Oct 20 '12 at 23:51
  • @StephenC: You mean the cases where one array is exausted I assume? It is basically the stop condition... (I also assume if an element appears twice in each array you want to print it twice) – amit Oct 20 '12 at 23:54
  • That's what I meant. And your update covers it. – Stephen C Oct 20 '12 at 23:58
  • @StephenC Thanks for the feedback & improvement suggestion. – amit Oct 21 '12 at 00:04
  • 7
    The complexity would be O(m+n) and not O(n). m is size of first array and n is the size of second array. – Abhijit K Rao Sep 06 '13 at 06:55
  • @AbhijitKRao I know this old but I am studying a similar problem for my algorithms course. Wouldn't your suggestion be wrong? Would it be `O(mn)` because it would need to traverse both lists in the best,worst and average cases? And they both in this algorithm are being incremented independently? – Callat Sep 27 '16 at 15:54
10

If the lengths of two arrays (say, A has N elements and B has M elements) are similar, then the best approach would be to perform linear search of one array's elements in another array. Of course, since the arrays are sorted, the next search should begin where the previous search has stopped. This is the classic principle used in "sorted array merge" algorithm. The complexity on O(N + M).

If the lengths are significantly different (say, M << N), then a much more optimal approach would be to iterate through elements of the shorter array and use binary search to look for these values in the longer array. The complexity is O(M * log N) in that case.

As you can see O(M * log N) is better than O(N + M) if M is much smaller than N, and worse otherwise.

The difference in array sizes which should trigger the switch from one approach to another depends on some practical considerations. If should be chosen based on practical experiments with your data.

These two approaches (linear and binary searches) can be "blended" into a single algorithm. Let's assume M <= N. In that case let's choose step value S = [N / M]. You take first element from array A and perform a straddled linear search for that element in array B with step S, meaning that you check elements B[0], B[S], B[2*S], B[3*S], ... and so on. Once you find the index range [S*i, S*(i+1)] that potentially contains the element you are searching for, you switch to binary search inside that segment of array B. Done. The straddled linear search for the next element of A begins where the previous search left off. (As a side note, it might make sense to choose the value of S equal to a power of 2).

This "blended" algorithm is the most asymptotically optimal search/merge algorithm for two sorted arrays in existence. However, in practice the more simple approach with choosing either binary or linear search depending on relative sizes of the arrays works perfectly well.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • I'm wondering, in the "blended" algorithm, why are you doing binary search on array B, which has fewer elements than A? Also, do you have any references for the statement: "This 'blended' algorithm is the most asymptotically optimal search/merge algorithm for two sorted arrays in existence." ? – stensootla Nov 13 '16 at 00:50
  • @abc: If I remember correctly, a formal proof (or a reference to one) can be found in "Asymptotically efficient in-place merging" article: http://www.sciencedirect.com/science/article/pii/S0304397598001625 – AnT stands with Russia Nov 13 '16 at 03:48
1

besides compare one with all the elements in other array

You will have to compare A[] to B[] in order to know that they are the same -- unless you know a lot about what kind of data they can hold. The nature of the comparison probably has many solutions and can be optimized as required.

If the arrays are very strictly created ie only sequential values of a known pattern and always starts from a known point you could just look at the length of each array and know whether or not all items are common.

This unfortunately doesn't sound like a very realistic or useful array and so you are back to checking for A[i] in B[]

Ryan Tennill
  • 156
  • 7