Linq to remove duplicates in ASP.NET WEB API in unconventional way

Question

This is my model:

 public class Combination
{
    public int Id { get; set; }
    public int CombOne{ get; set; }
    public int CombTwo{ get; set; }

}

I want to write a LINQ statement to extract those instances of Combination class which contains a duplicate combination of properties CombOne and CombTwo. So if there are three instances like:

Combination C1= new Combination{Id=1, CombOne=1, CombTwo=2};
Combination C2= new Combination{Id=2, CombOne=2, CombTwo=1};
Combination C3= new Combination{Id=3, CombOne=1, CombTwo=2};

Linq statement should return a list of C2, and C3 since they contain a duplicate combination of CombOne and CombTwo variables, at the same time it should preserve the original instance(C1)(Should not return C1 as it is the first instance of that combination.)

I got correct result with a foreach loop.

List<Combination> invalid2 = new List<Combination>();
foreach (Combination g in list)// Gamelist.Match is a list of Combination type
{
    if (invalid2.Contains(g))
        continue;
    List<Combination> invalid3 = (from r in list
                                    where
                                        ((r != g) &&
                                        (((r.CombOne == g.CombOne) && (r.CombTwo == g.CombTwo)) ||
                                        ((r.CombOne == g.CombTwo) && (r.CombTwo == g.CombOne))))
                                    select r).ToList();
    invalid2 = invalid2.Concat(invalid3).ToList();
}

I would like to get the result using only Linq statements to improve efficiency. I tried a lot but did not get the desired output. Thanks in advance for your sincere effort

tvanfosson · Accepted Answer · 2014-02-22T15:03:48.460

If I understand you correctly you want the output to produce any instance which contains the set { CombOne, CombTwo } that has been seen before. This would imply some sort of aggregation. To do this you need to keep track of the instances which have been seen before and reference that set to ensure that each subsequent element examined hasn't already been seen. I'm going to take advantage of the fact that Hashet.Add doesn't add an element that is already in a set and use a custom comparer for equality in the HashSet.

 var set = new HashSet<Combination>(new CombinationComparer());
 var invalid = list.Aggregate(new List<Combination>(list.Count),
                              (a,c) => 
                              {
                                 if (!set.Add(c))
                                 {
                                     a.Add(c);
                                 }
                                 return a;
                              });

where

public class CombinationComparer : IEqualityComparer<Combination>
{
     public bool Equals(Combination c1, Combination c2)
     {
         if (ReferenceEquals(c1,c2))
         {
              return true;
         }

         if (c1 == null || c2 == null)
         {
              return false;
         }

         return (c1.CombOne == c2.CombOne && c1.CombTwo == c2.CombTwo)
                    || (c1.CombOne == c2.CombTwo && c1.CombTwo == c2.CombOne);
    }

    public int GetHashCode(Combination c)
    {
         if (c == null)
         {
             return 0;
         }

         unchecked
         {
              // it's important that this be commutative so we don't
              // do the usual multiply by a prime to differentiate
              // them.
              return CombOne.GetHashCode() + CombTwo.GetHashCode();
         }

    }
}

I'll note that that isn't any more efficient and is somewhat less readable than using a loop and building the result as you go:

var set = new HashSet<Combination>(new CombinationComparer());
var invalid = new List<Combination>(list.Count);
foreach (var item in list)
{
    if (set.Add(item)) continue;

    invalid.Add(item);
}

In both cases you get, as a bonus, the set of the unique and first occurring repeats stored in set. Using the HashSet makes this pretty efficient in both cases as you only iterate over the list once and both HashSet.Add and List.Add are average case O(1) - especially as we pre-size the list to it's maximum size initially.

I'll note that if what you really want is to remove duplicates, you could simply use the comparer with Distinct. That would be similar to the above except that you're not keeping the invalid list.

var unique = list.Distinct(new CombinationComparer());

I've created a working fiddle at http://dotnetfiddle.net/YoApPu

I guess hashset will eliminate an instance only if they are not equal. Here I am trying to list objects of type Combination, each of which has a unique Id. So I don't think it will eliminate any instance of Combination. — Sreeraj, Feb 22 '14 at 07:45
@Sreeraj - you're missing the fact that the comparer is being used to define equality, not the object's equals method. According to the comparer they are equal if the are the same object, they are both null, or their `Comb*` match according to your criteria. — tvanfosson, Feb 22 '14 at 13:57
@KonradRudolph yes, I could have used XOR. My excuse is I haven't had my coffee yet. :) Normally I would multiply by primes, see http://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode That's really what I was trying to say except in this case we want them to be commutative but still not collide on (x,x) <=> (y,y) — tvanfosson, Feb 22 '14 at 14:54

Sreeraj · Answer 2 · 2014-02-22T08:50:27.717

0

 invalid2= (from r in GameList.Match
                   from g in GameList.Match
                   where((r.Id<g.Id)&&(((r.TeamAId == g.TeamAId) && (r.TeamBId == g.TeamBId)) || ((r.TeamAId == g.TeamBId) && (r.TeamBId == g.TeamAId))))
                   select g).Distinct().ToList();

This will work in the cases where id strictly follows an ascending pattern. In other cases, we will have to use nested query to first sort the list using orderby in ascending fashion an then execute the above query.

edited Feb 22 '14 at 08:50

answered Feb 22 '14 at 08:40

Sreeraj

2,306
2
18
31

This is horribly inefficient - O(n^2) since you iterate over the list once per element in the list. My solution using the `HashSet` and custom comparer is O(n) (or n times faster than yours). You should at least try my solution and see if it works. – tvanfosson Feb 22 '14 at 14:00

Linq to remove duplicates in ASP.NET WEB API in unconventional way

2 Answers2