0

I need to remove elements in a single list considering one or more duplicated subelement

Classes

public class Person
{
    public int id { get; set; }
    public string name { get; set; }
    public List<IdentificationDocument> documents { get; set; }

    public Person()
    {
        documents = new List<IdentificationDocument>();
    }
}

public class IdentificationDocument
{
    public string number { get; set; }
}

Code:

        var person1 = new Person() {id = 1, name = "Bob" };
        var person2 = new Person() {id = 2, name = "Ted" };
        var person3 = new Person() {id = 3, name = "Will_1" };
        var person4 = new Person() {id = 4, name = "Will_2" };

        person1.documents.Add(new IdentificationDocument() { number = "123" });
        person2.documents.Add(new IdentificationDocument() { number = "456" });
        person3.documents.Add(new IdentificationDocument() { number = "789" });
        person4.documents.Add(new IdentificationDocument() { number = "789" }); //duplicate

        var personList1 = new List<Person>();

        personList1.Add(person1);
        personList1.Add(person2);
        personList1.Add(person3);
        personList1.Add(person4);

        //more data for performance test
        for (int i = 0; i < 20000; i++)
        {
            var personx = new Person() { id = i, name = Guid.NewGuid().ToString() };
            personx.documents.Add(new IdentificationDocument() { number = Guid.NewGuid().ToString() });
            personx.documents.Add(new IdentificationDocument() { number = Guid.NewGuid().ToString() });
            personList1.Add(personx);
        }

        var result = //Here comes the linq query

        result.ForEach(r => Console.WriteLine(r.id + " " +r.name));

Expected result:

1 Bob
2 Ted
3 Will_1

Example

https://dotnetfiddle.net/LbPLcP

Thank you!

  • Does it matter which one is removed? You said you expect person 4 to be removed, but what is the criteria? Is it just order of placement in the list, earlier has priority? What happens if you added a fifth person and they had two identity documents that matched two others in the list (lets say they had 123 and 456 as their documents). Would you want to keep the person with 2 documents and remove the other two people? – pstrjds Dec 14 '17 at 22:36
  • 3
    Possible duplicate of [LINQ's Distinct() on a particular property](https://stackoverflow.com/questions/489258/linqs-distinct-on-a-particular-property) – Ousmane D. Dec 14 '17 at 22:39

3 Answers3

0

You can use the Enumerable.Distinct<TSource> method from LINQ. You'll need to create a custom comparer to compare using the subelement.

See How do I use a custom comparer with the Linq Distinct method?

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466
0

Well, yes, you could use a custom comparer. But that's going to be lots more code than your specific example requires. If your specific example is all you need, this this will work fine:

var personDocumentPairs = personList1
    .SelectMany(e => e.documents.Select(t => new {person = e, document = t}))
    .GroupBy(e => e.document.number).Select(e => e.First());
var result = personDocumentPairs.Select(e => e.person).Distinct();
Adam Brown
  • 1,667
  • 7
  • 9
0

along the lines of Adam's solution the trick is to iterate persons and group them by associated document numbers.

// persons with already assigned documents
// Will_2
var duplicate = from person in personList1
                from document in person.documents
                group person by document.number into groupings
                let counter = groupings.Count()
                where counter > 1
                from person in groupings
                    .OrderBy(p => p.id)
                    .Skip(1)
                select person;

// persons without already assigned documents
// Bob
// Ted
// Will_1
var distinct = from person in personList1
               from document in person.documents
               group person by document.number into groupings
               from person in groupings
                   .OrderBy(p => p.id)
                   .Take(1)
               select person;

the orderby is a made up rule for the already assigned documents persons, but your mileage may vary

Dan Dohotaru
  • 2,809
  • 19
  • 15