72

I need to determine whether or not two sets contains exactly the same elements. The ordering does not matter.

For instance, these two arrays should be considered equal:

IEnumerable<int> data = new []{3, 5, 6, 9};
IEnumerable<int> otherData = new []{6, 5, 9, 3}

One set cannot contain any elements, that are not in the other.

Can this be done using the built-in query operators? And what would be the most efficient way to implement it, considering that the number of elements could range from a few to hundreds?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
driis
  • 161,458
  • 45
  • 265
  • 341

7 Answers7

123

If you want to treat the arrays as "sets" and ignore order and duplicate items, you can use HashSet<T>.SetEquals method:

var isEqual = new HashSet<int>(first).SetEquals(second);

Otherwise, your best bet is probably sorting both sequences in the same way and using SequenceEqual to compare them.

Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • 1
    I think HashSet.SetEquals was the method I was looking for :-) – driis Nov 04 '09 at 12:13
  • Good answer-- I forgot about SetEquals! If you may have dupes, before sorting you should probably copy the sequences into a List and compare the lengths first-- this saves you the (expensive) sorting or SequenceEqual() operations in case the lengths are different. – Justin Grant Nov 04 '09 at 12:36
  • 2
    @Justin Grant - I don't follow... You need to remove duplicates before you compare lengths, and this is just as expensive as sorting. – Kobi Nov 04 '09 at 12:52
  • Maybe I'm misunderstanding, but doesn't this also produce an O(n^2) solution? The HashSet ctor is remarked as O(n) and SetEquals is remarked thus also (if the same equality comparer is used for `first` as is for `second`). – Josh Gust Aug 07 '18 at 16:29
  • @JoshGust which makes the whole thing O(2n) = O(n) amortized not O(n^2). That's assuming a good hash function of course. – Mehrdad Afshari Aug 08 '18 at 06:33
  • @MehrdadAfshari Thanks for the correction. After doing some additional reading, I think I understand what you mean. – Josh Gust Aug 09 '18 at 19:26
  • @MehrdadAfshari Maybe this is taboo on SO, but I'm looking to learn more about this. Can you check my answer to a similar question: https://stackoverflow.com/a/51775026/4629442 and give me some places to learn more about understanding complexity? – Josh Gust Aug 14 '18 at 19:21
50

I suggest sorting both, and doing an element-by-element comparison.

data.OrderBy(x => x).SequenceEqual(otherData.OrderBy(x => x))

I'm not sure how fast the implementation of OrderBy is, but if it's a O(n log n) sort like you'd expect the total algorithm is O(n log n) as well.

For some cases of data, you can improve on this by using a custom implementation of OrderBy that for example uses a counting sort, for O(n+k), with k the size of the range wherein the values lie.

Joren
  • 14,472
  • 3
  • 50
  • 54
7

If you might have duplicates (or if you want a solution which performs better for longer lists), I'd try something like this:

static bool IsSame<T>(IEnumerable<T> set1, IEnumerable<T> set2)
{
    if (set1 == null && set2 == null)
        return true;
    if (set1 == null || set2 == null)
        return false;

    List<T> list1 = set1.ToList();
    List<T> list2 = set2.ToList();

    if (list1.Count != list2.Count)
        return false;

    list1.Sort();
    list2.Sort();

    return list1.SequenceEqual(list2);
}

UPDATE: oops, you guys are right-- the Except() solution below needs to look both ways before crossing the street. And it has lousy perf for longer lists. Ignore the suggestion below! :-)

Here's one easy way to do it. Note that this assumes the lists have no duplicates.

bool same = data.Except (otherData).Count() == 0;
Justin Grant
  • 44,807
  • 15
  • 124
  • 208
  • 3
    You could use .Any() rather than Count() - then it won't enumerate every item in the list. – Matt Breckon Nov 04 '09 at 12:06
  • 4
    What if `data = {1,2}, otherData = {1,2,3}`? You should also check the other way around. – Kobi Nov 04 '09 at 12:08
  • This won't work in my scenario, without checking both ways as suggested by Kobi. And with a few hundred elements, I would be worried about performance for that approach. – driis Nov 04 '09 at 12:19
  • 1
    @Kobi, why check both ways when you could just check which one is larger first and then do the except – johnny 5 May 21 '19 at 15:32
3

Here is another way to do it:

IEnumerable<int> data = new[] { 3, 5, 6, 9 };
IEnumerable<int> otherData = new[] { 6, 5, 9, 3 };

data = data.OrderBy(d => d);
otherData = otherData.OrderBy(d => d);
data.Zip(otherData, (x, y) => Tuple.Create(x, y)).All(d => d.Item1 == d.Item2);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
johnny 5
  • 19,893
  • 50
  • 121
  • 195
  • 2
    Zip will stop when it reaches the end of either list, meaning {1,2,3} {1,2,3,4} will compare "equal" by this method. You can check the lengths prior to Zip, but you might as well just use SequenceEqual. – Pagefault Oct 11 '16 at 07:34
0
  1. First, check the length. If they are different, the sets are different.
  2. you can do data.Intersect(otherData);, and check the length is identical.
  3. OR, simplt sort the sets, and iterate through them.
Kobi
  • 135,331
  • 41
  • 252
  • 292
  • 1
    "data.Intersect(otherData), and check the length is identical" -- you must make sure the length is identical to BOTH other sequences – Mark Sowul Mar 21 '12 at 20:51
  • @Mark - On the first step you were supposed to check they are both in the same length, so you should be ok. That isn't written very well, I agree. (also, talk about the long tail... over 2 years to get a comment `:)`) – Kobi Mar 21 '12 at 21:17
  • Yes that's true. Checking the length so much (since it's IEnumerable) is not necessarily feasible though compared with just creating a HashSet from sequence 1 and comparing it with sequence 2. Intersect will basically be doing this anyway. – Mark Sowul Mar 23 '12 at 18:44
  • I liked the simplicity of this method but it uses GetHashCode() to make element comparisons. – carlin.scott Mar 04 '16 at 18:30
-1

First check if both data collections have the same number of elements and the check if all the elements in one collection are presented in the other

        IEnumerable<int> data = new[] { 3, 5, 6, 9 };
        IEnumerable<int> otherData = new[] { 6, 5, 9, 3 };

        bool equals = data.Count() == otherData.Count() && data.All(x => otherData.Contains(x));
nunoalmeida
  • 94
  • 1
  • 5
  • For a regular array, .Contains is O(N) and .All is also O(N), making this O(N^2). The sort and/or set based options are better if your data set isn't trivially small. – Pagefault Oct 11 '16 at 07:37
  • Also doesn't give the correct answer if duplicates are permitted in the input. – Pagefault Oct 11 '16 at 07:39
-2

This should help:

    IEnumerable<int> data = new []{ 3,5,6,9 };
    IEnumerable<int> otherData = new[] {6, 5, 9, 3};

    if(data.All(x => otherData.Contains(x)))
    {
        //Code Goes Here
    }
Blounty
  • 3,342
  • 22
  • 21