5

I have a List. For valid reasons, I duplicate the List many times and use it for different purposes. At some point I need to check if the contents of all these collections are same.

Well, I know how to do this. But being a fan of "short hand" coding(linq...) I would like to know if I can check this EFFICIENTLY with the shortest number of lines of code.

    List<string> original, duplicate1, duplicate2, duplicate3, duplicate4
                                       = new List<string();

        //...some code.....
        bool isequal = duplicate4.sequenceequal(duplicate3) 
         && duplicate3.sequenceequal(duplicate2)
         && duplicate2.sequenceequal(duplicate1) 
         && duplicate1.sequenceequal(original);//can we do it better than this 

UPDATE

Codeinchaos pointed out certain senarios I havent thought of(duplicates and order of list).Though sequenceequal will take care of duplicates the order of the list can be a problem. So I am changing the code as follows. I need to copy the Lists for this.

List<List<string>> copy = new List<List<int>> { duplicate1, duplicate2,  
                                                 duplicate3, duplicate4 }; 
bool iseqaul  = (original.All(x => (copy.All(y => y.Remove(x))))
                                         && copy.All(n => n.Count == 0)); 

UPDATE2

Thanks to Eric-using a HashSet can be very efficient as follows. This wont cover duplicates though.

List<HashSet<string>> copy2 =new List<HashSet<string>>{new HashSet<string>(duplicate1),
                                                       new HashSet<string>(duplicate2),
                                                       new HashSet<string> duplicate3),
                                                       new HashSet<string>(duplicate4)};
  HashSet<string> origninalhashset = new HashSet<string>(original);
  bool eq = copy2.All(x => origninalhashset.SetEquals(x));

UPDATE3 Thanks to Eric - The original code in this post with SequenceEqual will work with sorting. As Sequenceequal will consider the order of collections, the collections need to be sorted before calling sequenceequal. I guess this is not much of a probelm as sorting is pretty fast(nlogn).

UPDATE4 As per Brian's suggestion, I can use a lookup for this.

var originallkup = original.ToLookup(i => i);    
var lookuplist = new List<ILookup<int, int>>
                                    {   duplicate4.ToLookup(i=>  i), 
                                        duplicate3.ToLookup(i=>  i), 
                                        duplicate2.ToLookup(i=>  i),
                                        duplicate1.ToLookup(i=>  i)
                                    };

bool isequal = (lookuplist.Sum(x => x.Count) == (originallkup.Count * 4)) &&       
   (originallkup.All(x => lookuplist.All(i => i[x.Key].Count() == x.Count())));

Thank you all for your responses.

Jimmy
  • 3,224
  • 5
  • 29
  • 47
  • 1
    Is the order important? Or just the content? – weston Feb 09 '12 at 12:20
  • @weston-only the content is important – Jimmy Feb 09 '12 at 12:22
  • 3
    "Efficiency" or "shortest number of lines of code" Pick one – weston Feb 09 '12 at 12:23
  • Basically you want to check if any list is different than original? – Tx3 Feb 09 '12 at 12:25
  • @Tx3 - yes I want to make sure that all the lists "contains" the same items. – Jimmy Feb 09 '12 at 12:26
  • If you know a bit more about the lists you might use it. I.e. if you think list 1 and 2 have a higher chance to be unequal, check those two first. – Michel Keijzers Feb 09 '12 at 12:27
  • @Michel-no information about the chances of inequality. – Jimmy Feb 09 '12 at 12:32
  • So your example code doesn't even do what you want? It checks if the collections contain the same elements in the same order, whereas you want to check set equality? And how do you want to treat duplicate values? – CodesInChaos Feb 09 '12 at 13:01
  • So is `{1,1,2}` equal to `{1,2}`, and is `{1,2}` equal to `{2,1}`? – CodesInChaos Feb 09 '12 at 13:02
  • @CodeInChaos - my bad. I havent thought about that – Jimmy Feb 09 '12 at 13:07
  • possible dupe: http://stackoverflow.com/questions/7086234/how-to-most-efficiently-test-if-two-arrays-contain-equivalent-items-in-c-sharp – Davin Tryon Feb 09 '12 at 13:12
  • @CodeInChaos-SequenceEquals will take care of duplicates but order of the collection is still a probelm – Jimmy Feb 09 '12 at 14:33
  • Using a hash-based solution will allow you to check contents in linear time. Using a sorting-based solution will allow you to check contents in O(nlogn) time, but will yield very simple code. A shame C# doesn't have a built in hashbag class, because that's what you want. – Brian Feb 09 '12 at 16:41
  • 2
    @Jimmy: If order is unimportant but duplicates are then you can sort the lists first before you compare them with SequenceEquals. That way you are not taking order into consideration but you are preserving duplicates. – Eric Lippert Feb 09 '12 at 16:41
  • @EricLippert - Thank you. But we do miss a multiset in C#. – Jimmy Feb 09 '12 at 17:01
  • possible duplicate of [Comparing two collections for equality irrespective of the order of items in them](http://stackoverflow.com/questions/50098/comparing-two-collections-for-equality-irrespective-of-the-order-of-items-in-the) – nawfal Nov 08 '13 at 20:24

3 Answers3

7

I have a List. I duplicate the List many times and use it for different purposes. At some point I need to check if the contents of all these collections are same.

A commenter then asks:

Is the order important? Or just the content?

And you respond:

only the content is important

In that case you are using the wrong data structure in the first place. Use a HashSet<T>, not a List<T>, to represent an unordered collection of items that must be cheaply compared for set equality.

Once you have everything in hash sets instead of lists, you can simply use their SetEquals method to see if any pair of sets is unequal.

Alternatively: keep everything in lists, until the point where you want to compare for equality. Initialize a hash set from one of the lists, and then use SetEquals to compare that hash set to every other list.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • @Eric-You are absolutely right.My only concern now is duplicates(because hashset wont take duplicates) – Jimmy Feb 09 '12 at 16:07
  • 1
    @Jimmy: Use a dictionary mapping key to value. Have key be the item and value be a count. To check contents are the same, just check if the dictionaries have matching length (which will tell you a count of distinct elements), then for each key in dict1 check that `dict2.TryGetValue(key, out count) && count == dict1[key]`. – Brian Feb 09 '12 at 16:30
  • @Brian- if I do so I have to write a wrapper and handle duplicates as count in dictionary when mofiying items. I wouldnt prefer that. – Jimmy Feb 09 '12 at 16:46
  • @Jimmy If your type is truely `string` or another immutable type, then there is no modification anyway, you will have to remove and then add, which would be simple enough functions in your new hashmap/dictionary wrapper collection type. – weston Feb 09 '12 at 17:13
  • @Brian-I was late to understand your suggestion. It should work. Thank you. – Jimmy Feb 09 '12 at 18:53
  • @weston-Thank you. actually I need not write a wrapper class, I can use ToLookUp() to covert the list to a look up. check my edit(update 4) in the question – Jimmy Feb 09 '12 at 19:03
  • @Eric: thank you! I was trying to compare two List using LINQ and found my way here. I didn't even know about HashSet, I'm afraid, but now I do. – Darryl Jun 11 '12 at 20:42
4

I honestly can't think of a more efficient solution, but as for reducing the number of lines of code, give this a bash:

var allLists = new List<List<string>>() { original, duplicate1, duplicate2, duplicate3, duplicate4 };

bool allEqual = allLists.All(l => l.SequenceEqual(original));

Or, use the Any operator - might be better in terms of performance.

bool allEqual = !allLists.Any(l => !l.SequenceEqual(original));

EDIT: Confirmed, Any will stop enumerating the source once it determines a value. Thank you MSDN.

EDIT # 2: I have been looking into the performance of SequenceEquals. This guy has a nice post comparing SequenceEquals to a more imperative function. I modified his example to work with List<string> and my findings match his. It would appear that as far as performance is concerned, SequenceEquals isn't high on the list of preferred methods.

tobias86
  • 4,979
  • 1
  • 21
  • 30
  • Not sure if `SequenceEquals` has an early out for reference equality, but if not, manually checking will make the code a bit faster. – CodesInChaos Feb 09 '12 at 12:41
  • @CodeInChaos: Thanks, I didn't think about an early out. I think `Any` might be a better move in this case. See my edited answer. – tobias86 Feb 09 '12 at 12:45
  • 1
    I wasn't talking about this kind of early out. I'm sure `All` already does that. I meant `l=>(object.ReferenceEquals(l,original)|| l.SequenceEqual(original)`. But I'm not sure if that is necessary either. – CodesInChaos Feb 09 '12 at 12:51
  • `All()` and `Any()` are both smart enough to stop when the outcome is known. – H H Feb 09 '12 at 12:59
  • @tobias86- I think your code is more readable. Efficiency is equal to the origianl code with "AND" – Jimmy Feb 09 '12 at 13:03
  • @tobias86 - as CodeInChaos pointed out if the contents of two lists are same but the order is different SequenceEquals will return false. So, I cannot use SequenceEquals at all. – Jimmy Feb 09 '12 at 14:34
0

You can use reflection to create a generic comparer, and always use it. Look this thread, has a loot of code that can help you: Comparing two collections for equality irrespective of the order of items in them

Community
  • 1
  • 1
Vinicius Ottoni
  • 4,631
  • 9
  • 42
  • 64
  • where does reflection come into this? – CodesInChaos Feb 09 '12 at 12:41
  • Using generics, look to the post and you 'll understand. – Vinicius Ottoni Feb 09 '12 at 12:44
  • @Vinicuius I looked over all solutions over there, and I don't see any reflection. So I don't understand. – CodesInChaos Feb 09 '12 at 12:53
  • Not a code like the MethodInfo or Assembly.Load, for exemple, but idea of that is used with the generics. "You can use reflection to dynamically create an instance of a type, bind the type to an existing object, [...]" http://msdn.microsoft.com/en-us/library/ms173183(v=vs.80).aspx - And is what happens when we use generics. – Vinicius Ottoni Feb 09 '12 at 13:02