19

I have a collection of Employee

Class Employee

{
  empName
  empID
  empLoc 
  empPL
  empShift
}

My list contains

 empName,empID,empLoc,empPL,empShift
    E1,1,L1,EPL1,S1 
    E2,2,L2,EPL2,S2
    E3,3,L3,EPL3,S3
    E4,4,L1,EPL1,S1
    E5,5,L5,EPL5,S5
        E6,6,L2,EPL2,S2

I need to take the employees having distinct values empLoc,empPL,empShift.

Is there is any way to achieve this using LINQ ?

kbvishnu
  • 14,760
  • 19
  • 71
  • 101
  • 2
    Do you need the *whole* information for those employees, or just the empLoc, empPL and empShirt? (Your naming is pretty nasty, by the way - given that they're within an Employee class, the "emp" prefix is redundant, but "Loc" and "PL" are relatively meaningless on their own.) – Jon Skeet Sep 25 '12 at 09:48
  • 1
    @JonSkeet this is the basic requirement. I just made a class. But in original it is having lot of properties and all I need is to check 3 properties.I m sorry about that :( . I need the whole information . Not just empLoc, empPL and empShift . – kbvishnu Sep 25 '12 at 09:51
  • 1
    Could you use DistinctBy from morelinq with a composite key (anonymous type)? – devdigital Sep 25 '12 at 10:00

5 Answers5

43

You can use GroupBy with anonymous type, and then get First:

list.GroupBy(e => new { 
                          empLoc = e.empLoc, 
                          empPL = e.empPL, 
                          empShift = e.empShift 
                       })

    .Select(g => g.First());
cuongle
  • 74,024
  • 28
  • 151
  • 206
  • 2
    This is the most convenient approach! Only if I could upvote this 10 more times :) – Amit Dash Aug 27 '14 at 06:24
  • Which one is more efficient? The GroupBy can be a lot slower depending on the number of groups. If the groups count approaches the size of the input array and the input array size is large, then, the Distinct approach is the way to go. See my answer below – Raj Rao Mar 31 '15 at 01:43
39

You could implement a custom IEqualityComparer<Employee>:

public class Employee
{
    public string empName { get; set; }
    public string empID { get; set; }
    public string empLoc { get; set; }
    public string empPL { get; set; }
    public string empShift { get; set; }

    public class Comparer : IEqualityComparer<Employee>
    {
        public bool Equals(Employee x, Employee y)
        {
            return x.empLoc == y.empLoc
                && x.empPL == y.empPL
                && x.empShift == y.empShift;
        }

        public int GetHashCode(Employee obj)
        {
            unchecked  // overflow is fine
            {
                int hash = 17;
                hash = hash * 23 + (obj.empLoc ?? "").GetHashCode();
                hash = hash * 23 + (obj.empPL ?? "").GetHashCode();
                hash = hash * 23 + (obj.empShift ?? "").GetHashCode();
                return hash;
            }
        }
    }
}

Now you can use this overload of Enumerable.Distinct:

var distinct = employees.Distinct(new Employee.Comparer());

The less reusable, robust and efficient approach, using an anonymous type:

var distinctKeys = employees.Select(e => new { e.empLoc, e.empPL, e.empShift })
                            .Distinct();
var joined = from e in employees
             join d in distinctKeys
             on new { e.empLoc, e.empPL, e.empShift } equals d
             select e;
// if you want to replace the original collection
employees = joined.ToList();
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Tim I tried with `IEqualityComparer` and its worked fine for me . But the second solution you provide is not working fine . The distinct query is not working. It contains a list with empLoc,empPl,empShift and the join also not working as expected. Can you please have a look ? Thanks – kbvishnu Sep 26 '12 at 05:16
  • @VeeKeyBee: The second solution is similar to Aghilas approach. Selecting the key fields into an anonymous type, then using `Enumerable.Distinct`. But instead of selecting the anonymous type i'm joining it with the original collection of employees (note the `select e` at the end). So that should work too. – Tim Schmelter Sep 26 '12 at 05:23
  • Now this is what we call a complete solution. Although I must say @Coung Le has answered with a very valuable shorter cut! – Amit Dash Aug 27 '14 at 06:26
  • 1
    @Curious: the `GroupBy` aproach is fine. But it is more work than using a custom `IEqualityComparer`(for CPU and memory). You can (re)use the `IEqualityComparer` for most LINQ extension methods. And if the implementation will change you only have one place to repair. – Tim Schmelter Aug 27 '14 at 06:41
  • @TimSchmelter: you are correct. As my need was for a temp purpose I would go with the GroupBy approach. If the comparer is to be reused I will definitely fall back on your suggestion :) – Amit Dash Aug 27 '14 at 06:46
15

You can try with this code

var result =  (from  item in List
              select new 
              {
                 EmpLoc = item.empLoc,
                 EmpPL= item.empPL,
                 EmpShift= item.empShift
              })
              .ToList()
              .Distinct();
Aghilas Yakoub
  • 28,516
  • 5
  • 46
  • 51
3

I was curious about which method would be faster:

  1. Using Distinct with a custom IEqualityComparer or
  2. Using the GroupBy method described by Cuong Le.

I found that depending on the size of the input data and the number of groups, the Distinct method can be a lot more performant. (as the number of groups tends towards the number of elements in the list, distinct runs faster).

Code runs in LinqPad!

    void Main()
    {
        List<C> cs = new List<C>();
        foreach(var i in Enumerable.Range(0,Int16.MaxValue*1000))
        {
            int modValue = Int16.MaxValue; //vary this value to see how the size of groups changes performance characteristics. Try 1, 5, 10, and very large numbers
            int j = i%modValue; 
            cs.Add(new C{I = i, J = j});
        }
        cs.Count ().Dump("Size of input array");

        TestGrouping(cs);
        TestDistinct(cs);
    }

    public void TestGrouping(List<C> cs)
    {
        Stopwatch sw = Stopwatch.StartNew();
        sw.Restart();
        var groupedCount  = cs.GroupBy (o => o.J).Select(s => s.First()).Count();
        groupedCount.Dump("num groups");
        sw.ElapsedMilliseconds.Dump("elapsed time for using grouping");
    }

    public void TestDistinct(List<C> cs)
    {
        Stopwatch sw = Stopwatch.StartNew();
        var distinctCount = cs.Distinct(new CComparerOnJ()).Count ();
        distinctCount.Dump("num distinct");
        sw.ElapsedMilliseconds.Dump("elapsed time for using distinct");
    }

    public class C
    {
        public int I {get; set;}
        public int J {get; set;}
    }

    public class CComparerOnJ : IEqualityComparer<C>
    {
        public bool Equals(C x, C y)
        {
            return x.J.Equals(y.J);
        }

        public int GetHashCode(C obj)
        {
            return obj.J.GetHashCode();
        }
    }
Raj Rao
  • 8,872
  • 12
  • 69
  • 83
0

Try,

var newList = 
(
from x in empCollection
select new {Loc = x.empLoc, PL = x.empPL, Shift = x.empShift}
).Distinct();
John Woo
  • 258,903
  • 69
  • 498
  • 492