3

I cant remember from my days in college, the way to compare two unsorted arrays of int and find the number of matches ? Each value is unique in it own array, and both arrays are the same size.

for example

int[5] a1 = new []{1,2,4,5,0}
int[5] a2 = new []{2,4,11,-6,7}

int numOfMatches = FindMatchesInPerformanceOfNLogN(a1,a2);

any one does remember ?

guyl
  • 2,158
  • 4
  • 32
  • 58
  • 5
    is there a upper bound to the values in the arrays and is it possible to have the same values inside? For unbound values there is IMHO no better solution than 2* sort ( = O(n log n)) and compare ( = O(n)) -> O(n log n) – Random Dev Sep 07 '11 at 12:21
  • One sort (n log n) and one bisection search (n elements * log n)? – xanatos Sep 07 '11 at 12:27
  • 2
    Did some upvoting on the answers here to make up for the person that (in my opinion) downvoted all answers without a good reason. – Øyvind Bråthen Sep 07 '11 at 12:41
  • how do you compare in n*log(n) please ? – guyl Sep 07 '11 at 12:41

7 Answers7

3

If you can store the contents of one of the arrays in a HashMap, then you can check for the existence of the elements in the other array by seeing if they exist in the HashMap. This is O(n).

David Weiser
  • 5,190
  • 4
  • 28
  • 35
  • if your willing to tolerate a small probability of false positives a bloom filter could be used as well. http://petarv.blogspot.com/2007/06/bloom-filter.html – eSniff Sep 14 '11 at 17:44
2

One array must be sorted so that you can compare in n*log(n). That is for every item in the unsorted array (n) you perform a binary search on the sorted array (log(n)). If both are unsorted, I don't see a way to compare in n*log(n).

Andreas
  • 6,447
  • 2
  • 34
  • 46
  • 1
    Actually it's m*log(n) with m = size of the unsorted array and n = size of the sorted array. If your sorted array is very small compared to the unsorted array the performance will be close to O(n) again. – Andreas Sep 07 '11 at 15:04
  • So isn't that n*log(n) to sort plus m*log(n) to search? – FatherShawn Oct 15 '17 at 17:56
  • If you also do the sorting, yes. But the big O doesn't care about constants (https://stackoverflow.com/questions/22188851/why-is-constant-always-dropped-from-big-o-analysis). – Andreas Oct 16 '17 at 19:33
1

how about this:

  1. concatenate the two arrays
  2. quicksort the result
  3. step through from array[1] to array[array.length - 1] and check array[i] against array[i-1]

if they are the same, you had a duplicate. This should also be O(n*log(n)) and will not require a binary search for each check.

mcfinnigan
  • 11,442
  • 35
  • 28
0

You could use LINQ:

var a1 = new int[5] {1, 2, 4, 5, 0};
var a2 = new int[5] {2, 4, 11, -6, 7};
var matches = a1.Intersect(a2).Count();

I'm not sure if you're just asking for a straight-forward way or the fastest/best way possible...

Grant Winney
  • 65,241
  • 13
  • 115
  • 165
0

You have two methods that I am aware of (ref: http://www2.cs.siu.edu/~mengxia/Courses%20PPT/220/carrano_ppt08.ppt):

Recursive (pseudocode)

Algorithm to search a[first] through a[last] for desiredItem
if (there are no elements to search)
    return false
else if (desiredItem equals a[first])
    return true
else    
    return the result of searching a[first+1] through a[last]

Efficiency

May be O(log n) though I have not tried it.

Sequential Search (pseudocode)

public boolean contains(Object anEntry)
{   
    boolean found = false;
    for (int index = 0; !found && (index < length); index++) {
    if (anEntry.equals(entry[index]))
            found = true;
    } 
    return found;
} 

Efficiency of a Sequential Search

Best case  O(1)
    Locate desired item first
Worst case  O(n)
    Must look at all the items
Average case O(n)
    Must look at half the items 
    O(n/2) is still O(n)

I am not aware of an O(log n) search algorithm unless it is sorted.

  • The first algorithm is a recursive implementation of a simple for loop over all items so the complexity is the same and O(N). (I'm not the downvoter btw) – Andreas Sep 07 '11 at 12:49
  • 1
    If you use that for comparison, as the opener intended, btw it's O(N^2). This is about the worst you can do. – Andreas Sep 07 '11 at 12:57
0

I don't know if it is the fastest way but you can do

int[] a1 = new []{1,2,4,5,0};
int[] a2 = new []{2,4,11,-6,7};
var result = a1.Intersect(a2).Count();

It is worth comparing this with other ways that are optimised for int as Intersect() operates on IEnumerable.

Ben Robinson
  • 21,601
  • 5
  • 62
  • 79
0

This problem is also amenable to parallelization: spawn n1 threads and have each one compare an element of a1 with n2 elements of a2, then sum values. Probably slower, but interesting to consider, is spawning n1 * n2 threads to do all comparisons simultaneously, then reducing. If P >> max(n1, n2) in the first case, P >> n1 * n2 in the second, you could do the whole thing in O(n) in the first case, O(log n) in the second.

Patrick87
  • 27,682
  • 3
  • 38
  • 73