Detecting duplicate records, selecting only first and counting with LINQ/C#

Question

I am looking for a little help with designing a query using C#/LINQ to meet the following requirements:

I have a list of companies:-

Id  Name                Email       Address

1   Company A         a@a.com       abc

2   Company B         b@b.com       abc

3   Company C         c@c.com       abc

4   Company D         d@d.com       abc

5   Company A         a@a.com       abc

My goal is to detect duplicate items based on two fields, in this example 'name' and 'email'.

Desired output is a list of customers shown below:

Duplicate customers shuold only be shown once
The quantity of similar records should be shown.

Desired duplicate list:-

Id  Qty Name        Email       Address

1   2   Company A       a@a.com     abc (Id/details of first)

2   1   Company B       b@b.com     abc

3   1   Company C       c@c.com     abc

4   1   Company D       d@d.com     abc

It is strange to output id, what is the correct id for Company A? — Johan Larsson, Nov 05 '12 at 11:12
http://stackoverflow.com/questions/1606679/remove-duplicates-in-the-list-using-linq — Rohit Vyas, Nov 05 '12 at 11:12
@RohitVyas Those solutions remove the duplicate records but do not count the number of duplicate records in each case. — Mohammad Banisaeid, Nov 05 '12 at 11:17

Rawling · Accepted Answer · 2012-11-05T11:18:17.463

If you explicitly want to use the lowest-ID record in each set of duplicates, you could use

var duplicates = companies
    .GroupBy(c => new { c.Name, c.Email })
    .Select(g => new { Qty = g.Count(), First = g.OrderBy(c => c.Id).First() } )
    .Select(p => new
        {
            Id = p.First.Id,
            Qty = p.Qty,
            Name = p.First.Name,
            Email = p.First.Email,
            Address = p.First.Address
        });

If you don't care which record's values are used, or if your source is already sorted by ID (ascending), you can drop the OrderBy call.

score 4 · Answer 2 · answered Nov 05 '12 at 11:14

4

from c in companies
group c by new { c.Name, c.Email } into g
select new
{
   Id = g.First().Id,
   Qty = g.Count(),
   Name = g.Key.Name,
   Email = g.Key.Email,
   Address = g.First().Address
};

answered Nov 05 '12 at 11:14

Amiram Korach

13,056
3
28
30

Detecting duplicate records, selecting only first and counting with LINQ/C#

2 Answers2