1

I have a Table with columns (ItemID, Name,Price).

The List of Items is populated with duplicated items.

For ex.:

--------------------------------------
ItemID          Name          Price
--------------------------------------
1               Bangles       100   
2               Saree         200   
3               Shoes         150   
4               Bangles       100   
5               Shoes         150  

How to remove the duplicates in the list using linq only two columns irrespective of primary key?

bjan
  • 2,000
  • 7
  • 32
  • 64
SrinivasNaidu
  • 439
  • 2
  • 10
  • 27
  • 2
    You know those aren't actually duplicates, because your two entries for "bangles" and "shoes" have different ItemID's, right? – Bob Tway May 19 '14 at 09:22
  • Yes the item ids are different. But my question is duplicates without primary key – SrinivasNaidu May 19 '14 at 09:40
  • Please add expected output/results as well – bjan May 19 '14 at 09:54
  • SQL based accepted answer of [this question](http://stackoverflow.com/questions/15814400/remove-rows-with-duplicate-values) may help you – bjan May 19 '14 at 10:04
  • yes i want to do that exactly using entity frame work – SrinivasNaidu May 19 '14 at 10:08
  • Use [this accepted answer](http://stackoverflow.com/questions/9980568/row-number-over-partition-by-xxx-in-linq) to select records, then create another list from results based on `RN <> 1` (where RN is not equal to 1, these are the records which are to be deleted) then use [this](http://msdn.microsoft.com/en-us/library/bb386925(v=vs.110).aspx) or whatever you think is best to delete selected records. – bjan May 19 '14 at 12:22

4 Answers4

4

Idea is to group items by criteria and then select first item in each group.

var distinctItems = items.GroupBy(i => new{i.Name, i.Price})
                         .Select(g => g.First());

Here is the complete example:

var items = new[]{
                    new Item{Id = 1, Name = "Bangles", Price = 100},
                    new Item{Id = 2, Name = "Saree",   Price = 200},
                    new Item{Id = 3, Name = "Shoes",   Price = 150},
                    new Item{Id = 4, Name = "Bangles", Price = 100},
                    new Item{Id = 5, Name = "Shoes",   Price = 150}
                 };


var distinctItems = items.GroupBy(i => new{i.Name, i.Price})
                         .Select(g => g.First());

foreach (var item in distinctItems)
{
    Console.WriteLine ("Name: {0} Price: {1}", item.Name, item.Price);
}   

prints:

Name: Bangles Price: 100
Name: Saree Price: 200
Name: Shoes Price: 150

Notes: consider using DistinctBy which uses a more advanced algorithms to select distinct objects by some criteria.

Ilya Ivanov
  • 23,148
  • 4
  • 64
  • 90
  • But i want to remove the duplicate records from the table – SrinivasNaidu May 19 '14 at 09:26
  • @SrinivasNaidu you mentioned in the question *How to remove the duplicates in the **list** using linq only two columns irrespective of primary key*. This is more a SQL or ORM question. – Ilya Ivanov May 19 '14 at 09:29
  • For filtering the returned list, maybe he could use simply `var distinctItems = items.Select(i => new{ i.Name, i.Price, }).Distinct();`. It is not clear how he wants to handle the situation where `Name` is duplicated with a conflicting `Price`. – Jeppe Stig Nielsen May 19 '14 at 09:31
  • @JeppeStigNielsen yeap, but probably he'll need `id`s as well, not sure about it. OP want's to update table in the database – Ilya Ivanov May 19 '14 at 09:32
  • In my case i will update the price and name. If the price and name already exist then it will be duplicate, so i want to remove that duplicate – SrinivasNaidu May 19 '14 at 09:41
  • Use sql from [this thread](http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows). It's hard to tell how is it better for you, because you gave no description of the environment, in which you are working. – Ilya Ivanov May 19 '14 at 09:43
0

Use GroupBy

items.GroupBy(item => new { Name = item.Name, Price = item.Price })

This will group them, and then you decide what you want to do, get the first or last for example.

NeutronCode
  • 365
  • 2
  • 13
0

If there are a lot of duplicates, then it is more efficient to do this directly in SQL, but if you want to do it with Linq, then you can do something like :

// Group and count the items in group
var grouped = (from r in dc.Items group r by new { r.Name, r.Price} into results
  select new { Count = results.Count(), results = results.ToList()} );

// select only the groups with duplicates
var itemsWithDuplicates = (from r in grouped where r.Count > 1 select r);

// Ignore the first item in each group
var duplicatesGrouped = (from r in itemsWithDuplicates select r.results.Skip(1));

//UnGroup them
var duplicates = duplicatesGrouped.SelectMany(r=>r);

Then presumably you can delete them using something like

dc.Items.DeleteAllOnSubmit(duplicates);
dc.SubmitChanges();
sgmoore
  • 15,694
  • 5
  • 43
  • 67
0

From the comments I take it you are working on a database. From this you should have a context of some sort available.

So you should be able to do something along this lines:

void Main()
{
    //dummy data
    var items = new List<Item>()
                {
                    new Item{Id =1, Name = "Bangles", Price=100},
                    new Item{Id =2, Name = "Saree",   Price=200},
                    new Item{Id =3, Name = "Shoes",   Price=150},
                    new Item{Id =4, Name = "Bangles", Price=100},
                    new Item{Id =5, Name = "Shoes",   Price=150}
                 };
    //select duplicate items         
    var itemsToDelete = items.GroupBy (i => new { i.Name, i.Price}).SelectMany(x => x.Skip(1));
    //delete duplicate items
    context.DeleteAllOnsubmit(itemsToDelete);
    //Save
    context.SaveChanges();

}

public class Item
{
    public int Id { get; set; }
    public string Name { get; set; }
    public int Price { get; set; }
}

In the following image, you have at first the raw data and the second table displays the duplicates which will be deleted from your source:

enter image description here

Marco
  • 22,856
  • 9
  • 75
  • 124