Compare arrays of int in high performance

Question

I cant remember from my days in college, the way to compare two unsorted arrays of int and find the number of matches ? Each value is unique in it own array, and both arrays are the same size.

for example

int[5] a1 = new []{1,2,4,5,0}
int[5] a2 = new []{2,4,11,-6,7}

int numOfMatches = FindMatchesInPerformanceOfNLogN(a1,a2);

any one does remember ?

is there a upper bound to the values in the arrays and is it possible to have the same values inside? For unbound values there is IMHO no better solution than 2* sort ( = O(n log n)) and compare ( = O(n)) -> O(n log n) — Random Dev, Sep 07 '11 at 12:21
One sort (n log n) and one bisection search (n elements * log n)? — xanatos, Sep 07 '11 at 12:27
Did some upvoting on the answers here to make up for the person that (in my opinion) downvoted all answers without a good reason. — Øyvind Bråthen, Sep 07 '11 at 12:41

score 3 · Answer 1 · answered Sep 08 '11 at 15:02

3

If you can store the contents of one of the arrays in a HashMap, then you can check for the existence of the elements in the other array by seeing if they exist in the HashMap. This is O(n).

answered Sep 08 '11 at 15:02

David Weiser

5,190
4
28
35

if your willing to tolerate a small probability of false positives a bloom filter could be used as well. http://petarv.blogspot.com/2007/06/bloom-filter.html – eSniff Sep 14 '11 at 17:44

score 2 · Accepted Answer · answered Sep 07 '11 at 12:55

2

One array must be sorted so that you can compare in n*log(n). That is for every item in the unsorted array (n) you perform a binary search on the sorted array (log(n)). If both are unsorted, I don't see a way to compare in n*log(n).

answered Sep 07 '11 at 12:55

Andreas

6,447
2
34
46

1

Actually it's m*log(n) with m = size of the unsorted array and n = size of the sorted array. If your sorted array is very small compared to the unsorted array the performance will be close to O(n) again. – Andreas Sep 07 '11 at 15:04
So isn't that n*log(n) to sort plus m*log(n) to search? – FatherShawn Oct 15 '17 at 17:56
If you also do the sorting, yes. But the big O doesn't care about constants (https://stackoverflow.com/questions/22188851/why-is-constant-always-dropped-from-big-o-analysis). – Andreas Oct 16 '17 at 19:33

score 1 · Answer 3 · answered Sep 07 '11 at 14:58

how about this:

concatenate the two arrays
quicksort the result
step through from array[1] to array[array.length - 1] and check array[i] against array[i-1]

if they are the same, you had a duplicate. This should also be O(n*log(n)) and will not require a binary search for each check.

score 0 · Answer 4 · answered Sep 07 '11 at 12:25

0

You could use LINQ:

var a1 = new int[5] {1, 2, 4, 5, 0};
var a2 = new int[5] {2, 4, 11, -6, 7};
var matches = a1.Intersect(a2).Count();

I'm not sure if you're just asking for a straight-forward way or the fastest/best way possible...

answered Sep 07 '11 at 12:25

Grant Winney

65,241
13
115
165

score 0 · Answer 5 · answered Sep 07 '11 at 12:27

You have two methods that I am aware of (ref: http://www2.cs.siu.edu/~mengxia/Courses%20PPT/220/carrano_ppt08.ppt):

Recursive (pseudocode)

Algorithm to search a[first] through a[last] for desiredItem
if (there are no elements to search)
    return false
else if (desiredItem equals a[first])
    return true
else    
    return the result of searching a[first+1] through a[last]

Efficiency

May be O(log n) though I have not tried it.

Sequential Search (pseudocode)

public boolean contains(Object anEntry)
{   
    boolean found = false;
    for (int index = 0; !found && (index < length); index++) {
    if (anEntry.equals(entry[index]))
            found = true;
    } 
    return found;
}

Efficiency of a Sequential Search

Best case  O(1)
    Locate desired item first
Worst case  O(n)
    Must look at all the items
Average case O(n)
    Must look at half the items 
    O(n/2) is still O(n)

I am not aware of an O(log n) search algorithm unless it is sorted.

The first algorithm is a recursive implementation of a simple for loop over all items so the complexity is the same and O(N). (I'm not the downvoter btw) — Andreas, Sep 07 '11 at 12:49
If you use that for comparison, as the opener intended, btw it's O(N^2). This is about the worst you can do. — Andreas, Sep 07 '11 at 12:57

score 0 · Answer 6 · answered Sep 07 '11 at 12:29

0

I don't know if it is the fastest way but you can do

int[] a1 = new []{1,2,4,5,0};
int[] a2 = new []{2,4,11,-6,7};
var result = a1.Intersect(a2).Count();

It is worth comparing this with other ways that are optimised for int as Intersect() operates on IEnumerable.

answered Sep 07 '11 at 12:29

Ben Robinson

21,601
5
62
79

4

Seems to be a lot of random down voting with no comments. All of these answers seem reasonable to me. – Ben Robinson Sep 07 '11 at 12:40

score 0 · Answer 7 · answered Sep 07 '11 at 12:58

This problem is also amenable to parallelization: spawn n1 threads and have each one compare an element of a1 with n2 elements of a2, then sum values. Probably slower, but interesting to consider, is spawning n1 * n2 threads to do all comparisons simultaneously, then reducing. If P >> max(n1, n2) in the first case, P >> n1 * n2 in the second, you could do the whole thing in O(n) in the first case, O(log n) in the second.

Compare arrays of int in high performance

7 Answers7