Check for duplications

Question

I store three values in list like this:

var mylist = new List<(int Id, string name, string surname)>();

Example data:

   1 | John    | Miller
   2 | Jessica | Scot
   3 | Robert  | Johnes
   4 | John    | Miller

How to distinct to get only records without dupliactions for name and surname (not counting Id) therefore result would be:

   2 | Jessica | Scot
   3 | Robert  | Johnes

Could you stop updating your question? I already had an answer but without the id field — fasaas, Oct 13 '18 at 07:00
Why not use a Dictionary for this to keep track of the entries already added? — thebenman, Oct 13 '18 at 07:01
First, learn how to use classes, then use this answer https://stackoverflow.com/questions/8400028/comparing-two-instances-of-a-class — Steve, Oct 13 '18 at 07:08
I think it just to use like this but i will not get their ids..?: var distinctCategories = mylist.Select(m => new { m.name, m.surname }) .Distinct() .ToList(); — DinoDin2, Oct 13 '18 at 07:16

score 3 · Accepted Answer · answered Oct 13 '18 at 07:25

3

Finally, after the question has been settled, I would propose the following thing.

This would be the class structure I would use to organize the table entries.

    public class MyTable
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Surname { get; set; }

        public MyTable(int Id, string Name, string Surname)
        {
            this.Id = Id;
            this.Name = Name;
            this.Surname = Surname;
        }
    }

Then, this would be my test method

    [Fact]
    public void RemovesAllInstancesOfDuplicateEntries()
    {
        var mylist = new List<MyTable>();
        mylist.Add(new MyTable(1 , "John" , "Miller"));
        mylist.Add(new MyTable(2 , "Jessica", "Scot"));
        mylist.Add(new MyTable(3 , "Robert", "Johnes"));
        mylist.Add(new MyTable(4 , "John", "Miller"));                        

        var actual = new MySUT().RemoveAllInstancesOfDuplicates(mylist);

        Assert.Equal(2, actual.Count);
        Assert.Equal(2, actual[0].Id);
        Assert.Equal(3, actual[1].Id);
    }

And, my implementation of the test would be the following

    public List<MyTable> RemoveAllInstancesOfDuplicates(List<MyTable> myTable)
    {
        List<MyTable> withoutAllInstancesOfDuplicates = new List<MyTable>();

        foreach(MyTable entry in myTable)
        {
            if (myTable.Count(row => 
                string.Equals(row.Name, entry.Name) && 
                string.Equals(row.Surname, entry.Surname)) == 1)
            {
                withoutAllInstancesOfDuplicates.Add(entry);
            }
        }

        return withoutAllInstancesOfDuplicates;
    }

answered Oct 13 '18 at 07:25

fasaas

592
1
6
23

Hi, can you explain why to create separated class just for this instead of Tuple as i proposed and use linq? Just wondering why this way. – DinoDin2 Oct 13 '18 at 07:38
Well, this is me personal preference, I prefer to have my code structured since we can use objects, instead of inlining it in the Tuple. What if in some near future you need to add a new variable? Are you going to change all the instances of the Tuple inline objects, or just easily update the class? – fasaas Oct 13 '18 at 07:40
I see, should i also put RemoveAllInstancesOfDuplicates in MyTable class? – DinoDin2 Oct 13 '18 at 07:42
I mean, you need to think about the responsibilities of the MyTable class. Is it just a placeholder that represents the database? Or does it need to have some logic? – fasaas Oct 13 '18 at 07:44
but then also i would need to add probably a property to that class of List of itself? because RemoveAllInstancesOfDuplicates is working on list of this type class – DinoDin2 Oct 13 '18 at 07:44
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181799/discussion-between-mayhem-and-dinodin2). – fasaas Oct 13 '18 at 07:44
@DinoDin2, Tuple vs Class [Fight 1](https://stackoverflow.com/questions/44650636/when-to-use-tuple-vs-class-c-sharp-7-0) – Drag and Drop Oct 13 '18 at 08:08

Zohar Peled · Answer 2 · 2018-10-13T08:28:34.857

1

You can use a combination of Where and Any, like this:

var noDupes = mylist
    .Where(a => !myList
        .Any(b => b.Id != a.Id && b.name == a.name && b.surname == a.surname));

This will return an IEnumerable<T> where T is your value tuple that contains only Jessica and Robert.

edited Oct 13 '18 at 08:28

answered Oct 13 '18 at 07:37

Zohar Peled

79,642
10
69
121

error: Operator '&&' cannot be applied to operands of type 'bool' and 'string' – DinoDin2 Oct 13 '18 at 07:41
There was a typo in my answer. I've forgot one `=` at the end. Fixed. – Zohar Peled Oct 13 '18 at 08:29

score -1 · Answer 3 · answered Oct 13 '18 at 08:00

-1

var mylist  = originalList
  .GroupBy(x => x.name + x.surname) //or any other combination
  .Select(group => group.First())
  .ToList();

answered Oct 13 '18 at 08:00

roozbeh S

1,084
1
9
16

While I like the GroupBy solution, `x.name + x.surname` is a bad idea. If you want to group on multiple column you create a key with those columns not a sum of the columns. Different couple can have the same sum. `"Foo"+"Bar"=="FooBar"+""=="Fo"+"oBar"` etc etc. And this is only for string other type could have less expected behavior. – Drag and Drop Oct 13 '18 at 08:18
It was just an example of how the code must be and i mentioned that any combination that suits the situation can be used. However your duplicate answer is not correct and will not return desired values. – roozbeh S Oct 13 '18 at 08:55
Sorry. My bad. The problem is filtering with `.Where(grp => grp.Count() == 1)` it removes all the instances of the duplicated data. which is not how distinction should work in general. but it matches the distinction in the question. – roozbeh S Oct 13 '18 at 09:22

Drag and Drop · Answer 4 · 2018-10-13T09:08:42.597

-1

Group on Column you want uniqness, and count the number of element on those group. You want only the element with 1.

mylist 
 .GroupBy(x => new {x.Name, x.Surname})
 .Where(grp => grp.Count() == 1)
 .Select(grp => grp.First());

edited Oct 13 '18 at 09:08

answered Oct 13 '18 at 08:23

Drag and Drop

2,672
3
25
37

Check for duplications

4 Answers4