1

I have two List<string> that contain huge data, and I have the following code which I used till now. But I have some doubts regarding this, due to lot of confusion about the data is compared or not in the list items. Here I am using sequence Equal to compare the data. I have two questions. Somewhere I found that SequenceEqual will compare the data in the lists. So used it.

  1. Will SequenceEqual compares the data in the lists?

  2. Better way of code to improve performance. As per understanding I kept only three items in both the lists but our requirement has huge data items in the lists. So need to improve performance.

    bool value = false;
    List<string> list1 = new List<string>();
    list1.Add("one");
    list1.Add("two");
    list1.Add("three");
    
    List<string> list2 = new List<string>();
    list2.Add("one");
    list2.Add("two");
    list2.Add("three");
    list1.Sort();
    list2.Sort();
    if (list1.SequenceEqual(list2))
    {
        value = true;
    }
    else
    {
        value = false;
    }
    return value;
    
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
bmdprasad
  • 65
  • 7
  • 1
    Can the lists have *duplicates*, e.g. `{ "one", "two", "three", "one" };` - `"one"` is duplicate? – Dmitry Bychenko Jun 14 '19 at 09:24
  • please have a look at this thread: https://stackoverflow.com/questions/49581893/what-is-the-benefit-of-sequenceequal – Sebastian Siemens Jun 14 '19 at 09:25
  • 1
    if you want to know if `sequenceequal` is fast enough, you just have to test / benchmark it it. If you know the data and structure it is maybe better to implement your own algorithm. – Sebastian Siemens Jun 14 '19 at 09:28
  • Related: [Comparing two collections for equality irrespective of the order of items in them](https://stackoverflow.com/questions/50098/comparing-two-collections-for-equality-irrespective-of-the-order-of-items-in-the). – Theodor Zoulias Jul 06 '23 at 06:11

3 Answers3

1

Here is the implementation of SequenceEqual:

public static bool SequenceEqual<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
    if (comparer == null)
    {
        comparer = EqualityComparer<TSource>.Default;
    }
    if (first == null)
    {
        throw Error.ArgumentNull("first");
    }
    if (second == null)
    {
        throw Error.ArgumentNull("second");
    }
    using (IEnumerator<TSource> enumerator = first.GetEnumerator())
    {
        using (IEnumerator<TSource> enumerator2 = second.GetEnumerator())
        {
            while (enumerator.MoveNext())
            {
                if (!enumerator2.MoveNext() || !comparer.Equals(enumerator.Current, enumerator2.Current))
                {
                    return false;
                }
            }
            if (enumerator2.MoveNext())
            {
                return false;
            }
        }
    }
    return true;
}

It does not check length and simply traverses both lists to confirm if they are equal.

If you check two lists that are 1_000_000 and 1_000_001 in length then it is terribly inefficient.

So check the lengths first, then call SequenceEqual only if they are the same.

If you're checking for set equality rather than position equality then make a HashSet<string> out of each and then check against those.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
1

Your method of comparing the two lists is essentially this:

static bool SequenceUnorderedEqual<T>(
    IEnumerable<T> source1, IEnumerable<T> source2,
    IComparer<T> comparer = default)
{
    List<T> list1 = source1.ToList();
    List<T> list2 = source2.ToList();
    list1.Sort(comparer);
    list2.Sort(comparer);
    return list1.SequenceEqual(list2);
}

Using a dictionary is at least ten times faster, at the cost of being a bit more complex. It also doesn't require the items to be comparable (they only have to be equitable).

using System.Runtime.InteropServices;

//...

/// <summary>
/// Determines whether two sequences are equal according to an equality comparer,
/// ignoring the order of the items in the sequences.
/// </summary>
static bool SequenceUnorderedEqual<T>(
    IEnumerable<T> source1, IEnumerable<T> source2,
    IEqualityComparer<T> comparer = default)
{
    Dictionary<T, int> occurences;
    if (Enumerable.TryGetNonEnumeratedCount(source1, out int count1) &&
        Enumerable.TryGetNonEnumeratedCount(source2, out int count2))
    {
        if (count1 != count2) return false;
        occurences = new(count1, comparer);
    }
    else
    {
        occurences = new(comparer);
    }

    // Populating the dictionary using source1.
    foreach (T item in source1)
    {
        CollectionsMarshal
            .GetValueRefOrAddDefault(occurences, item, out _)++;
    }

    // Depopulating the dictionary using source2.
    foreach (T item in source2)
    {
        ref int count = ref CollectionsMarshal
            .GetValueRefOrNullRef(occurences, item);
        if (Unsafe.IsNullRef(ref count) || count == 0) return false;
        count--;
    }

    // Now all counters should be zero.
    foreach (int count in occurences.Values) if (count != 0) return false;
    return true;
}

A simpler but less efficient implementation that doesn't take advantage of the CollectionsMarshal APIs, can be found in the 2nd revision of this answer.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • HOw to call this method , earlier we have used SequenceUnorderedEqual(Class class1, Class class2) but you have used SequenceUnorderedEqual(Ienumberable class1, IEnumerable class2). Please help me out. how to call this IEnumerable type – bmdprasad Jun 14 '19 at 13:05
  • @bmdprasad populate your two lists the way you are doing now, and then instead of sorting them and passing them to `SequenceEqual`, pass them unsorted to `SequenceUnorderedEqual`. – Theodor Zoulias Jun 14 '19 at 13:20
0

As I understand it, using the SequenceEqual method from Linq will be very efficient. It effectively returns true if, quote, "the two source sequences are of equal length and their corresponding elements are equal according to the default equality comparer for their type", otherwise false. There won't really be a much faster way of doing this comparision.

Max
  • 434
  • 1
  • 3
  • 13
  • Also, if your list is too long, you've posted another question in which an answer explains why that could impact performence. I advice you should refer to Sebastians comment above to check if this is the case. – Max Jun 14 '19 at 09:33
  • as per your code it is comparing the data in the lists..So sequenceEqual will compares the data and count of lists as well ..Correct? – bmdprasad Jun 14 '19 at 09:36
  • @bmdprasad - No, it does not check `Length` (which is bizarre). – Enigmativity Jun 14 '19 at 09:43
  • @MaxDoesStuff - It does not check length. – Enigmativity Jun 14 '19 at 09:44
  • I quoted the System.Linq documentation where it says the two lists must be of equal length - should I have put Count instead of Length? – Max Jun 14 '19 at 09:45
  • @Enigmativity then why would `new int[] { 1, 2 }.SequenceEqual(new int[] { 1, 2, 1, 2 })` return false? – Max Jun 14 '19 at 09:48
  • @MaxDoesStuff will it check data in both the lists – bmdprasad Jun 14 '19 at 09:52
  • Yes, the code you posted in your original question will return false since the two lists are not the same – Max Jun 14 '19 at 09:53
  • @MaxDoesStuff - The implementation of `SequenceEqual` does **not** check the length of the lists. It simply traverses both enumerables to confirm equal, and if it hits the end of one but not the other it will then return `false`. – Enigmativity Jun 14 '19 at 09:55
  • can we write linq code I used var test = from l1 in list1 where list2.Any(x=>x == l1 select l1 – bmdprasad Jun 14 '19 at 12:45