7

I am looking for a little help with designing a query using C#/LINQ to meet the following requirements:

I have a list of companies:-

Id  Name                Email       Address

1   Company A         a@a.com       abc

2   Company B         b@b.com       abc

3   Company C         c@c.com       abc

4   Company D         d@d.com       abc

5   Company A         a@a.com       abc

My goal is to detect duplicate items based on two fields, in this example 'name' and 'email'.

Desired output is a list of customers shown below:

  1. Duplicate customers shuold only be shown once
  2. The quantity of similar records should be shown.

Desired duplicate list:-

Id  Qty Name        Email       Address

1   2   Company A       a@a.com     abc (Id/details of first)

2   1   Company B       b@b.com     abc

3   1   Company C       c@c.com     abc

4   1   Company D       d@d.com     abc
Nick
  • 5,844
  • 11
  • 52
  • 98

2 Answers2

9

If you explicitly want to use the lowest-ID record in each set of duplicates, you could use

var duplicates = companies
    .GroupBy(c => new { c.Name, c.Email })
    .Select(g => new { Qty = g.Count(), First = g.OrderBy(c => c.Id).First() } )
    .Select(p => new
        {
            Id = p.First.Id,
            Qty = p.Qty,
            Name = p.First.Name,
            Email = p.First.Email,
            Address = p.First.Address
        });

If you don't care which record's values are used, or if your source is already sorted by ID (ascending), you can drop the OrderBy call.

Rawling
  • 49,248
  • 7
  • 89
  • 127
4
from c in companies
group c by new { c.Name, c.Email } into g
select new
{
   Id = g.First().Id,
   Qty = g.Count(),
   Name = g.Key.Name,
   Email = g.Key.Email,
   Address = g.First().Address
};
Amiram Korach
  • 13,056
  • 3
  • 28
  • 30