14

I have two arrays of objects which are likely to have the same values, but in a different order, e.g.

{ "cat", "dog", "mouse", "pangolin" }

{ "dog", "pangolin", "cat", "mouse" }

I wish to treat these two arrays as equal. What's the fastest way to test this?

izb
  • 50,101
  • 39
  • 117
  • 168

6 Answers6

19

I can't guarantee that this is the fastest, but it's certainly quite efficient:

bool areEquivalent = array1.Length == array2.Length 
                     && new HashSet<string>(array1).SetEquals(array2);

EDIT: SaeedAlg and Sandris raise valid points about different frequencies of duplicates causing problems with this approach. I can see two workarounds if this is important (haven't given much thought to their respective efficiencies):

1.Sort the arrays and then compare them sequentially. This approach, in theory, should have quadratic complexity in the worst case. E.g.:

return array1.Length == array2.Length
       && array1.OrderBy(s => s).SequenceEqual(array2.OrderBy(s => s));

2.Build up a frequency-table of strings in each array and then compare them. E.g.:

if(array1.Length != array2.Length)
   return false;

var f1 = array1.GroupBy(s => s)
               .Select(group => new {group.Key, Count = group.Count() });

var f2 = array2.GroupBy(s => s)
               .Select(group => new {group.Key, Count = group.Count() });

return !f1.Except(f2).Any();
Ani
  • 111,048
  • 26
  • 262
  • 307
  • It's like @Albin Sunnanbo answer may be you return two array are equals and they aren't equal. – Saeed Amiri Nov 11 '10 at 10:04
  • @SaeedAlg: Can you provide an example? – Ani Nov 11 '10 at 10:08
  • 3
    @Ani: an example is {"cat", "cat", "dog"}, {"cat", "dog", "dog"}. These arrays are not permutations of each other, but your expression returns true. – sandris Nov 11 '10 at 11:19
  • @SaeedAlg, @sandris: Thanks for that. Any comments on the workarounds I've suggested? – Ani Nov 11 '10 at 12:02
  • @Ani, nothing, but I compared some manner of linq with handwritten code and I see that linq is slowest, for example in http://stackoverflow.com/questions/4119569/find-the-min-value-in-int-arry-with-c/4121721#4121721 I, compared the speed of linq `Min` which is slowest than hand written one. – Saeed Amiri Nov 11 '10 at 13:52
  • @Ani: I think both approaches are correct, although sorting and comparing has O(n log n) complexity, as pointed out in djechelon's answer. – sandris Nov 12 '10 at 15:26
  • @sandris: In quicksort's worst case (sequence almost sorted already), the time-compexity deteriorates to quadratic. This is rare, which is why I emphasized "in theory". http://en.wikipedia.org/wiki/Quicksort – Ani Nov 18 '10 at 03:13
  • `SequenceEqual()` checks lengths, so you don't need to do that. – Daria Oct 27 '16 at 18:33
  • 1
    @Andikki: Yes, but the SequenceEqual is after the OrderBy, so the quick-fail check can help unnecessary sorting in many cases. – Ani Nov 08 '16 at 11:26
4

I think the only reasonable way is to sort them and then compare.

Sorting requires O(n logn) and comparing O(n), so that's still a total of O(n logn)

usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305
2

Convert both arrays to HashSets and use setequals

Martin DeMello
  • 11,876
  • 7
  • 49
  • 64
2

Have you tried something like

string[] arr1 = {"cat", "dog", "mouse", "pangolin"};

string[] arr2 = {"dog", "pangolin", "cat", "mouse"};

bool equal = arr1.Except(arr2).Count() == 0 && arr2.Except(arr1).Count() == 0;
m.s.
  • 16,063
  • 7
  • 53
  • 88
Adriaan Stander
  • 162,879
  • 31
  • 289
  • 284
0

I would use a HashSet, assuming there are no duplicates

string[] arr1 = new string[] { "cat", "dog", "mouse", "pangolin" };
string[] arr2 = new string[] { "dog", "pangolin", "cat", "mouse" };

bool result = true;
if (arr1.Length != arr2.Length)
{
    result = false;
}
else
{
    HashSet<string> hash1 = new HashSet<string>(arr1);
    foreach (var s in arr2)
    {
        if (!hash1.Contains(s))
            result = false;
    }
}

Edit:
If you just have four elements it might be faster to skip the HashSet and use arr1.Contains in the comparison. Measure and pick the fastest for your array size.

Albin Sunnanbo
  • 46,430
  • 8
  • 69
  • 108
  • It has positive false error, hash doesn't support to do not make unique value for different items – Saeed Amiri Nov 11 '10 at 09:57
  • this is theoretically cheaper, but it might be faster in practice to just say hash1 = HashSet(arr1); hash2 = HashSet(arr2); hash1.SetEquals(hash2) – Martin DeMello Nov 11 '10 at 09:57
0

Pseudocode :

A:array
B:array
C:hashtable

if A.length != B.length then return false;

foreach objA in A
{
H = objA;
if H is not found in C.Keys then
C.add(H as key,1 as initial value);
else
C.Val[H as key]++;
}

foreach objB in B
{
H = objB;
if H is not found in C.Keys then
return false;
else
C.Val[H as key]--;
}

if(C contains non-zero value)
return false;
else
return true;
Mohammad Mazaz
  • 316
  • 1
  • 7