0

Say we have a list of "A students", and a list of "B students". We then add both lists to a more generic list, called "students". Someone then decides to complicate our lives by adding a duplicate list of "A Students" to the generic "students" list. What's the most efficient way to remove one of the duplicate lists of "A students"? Note that there are two custom classes involved.

The generic students list in the code is called lstStudents. This is the list I would like to remove any duplicates from.

(I tried to come up with a better example, but this is the best I could do right now.)

I don't have to use LINQ, but it's available. MoreLinq is available as well.

Here are my classes:

public class Student
{
    public Student(string _name, int _age, Exam _lastExam)
    {
        name = _name;
        age = _age;
        lastExam = _lastExam;
    }

    public string name { get; set; }
    public int age { get; set; }
    public Exam lastExam { get; set; }
}

public class Exam
{
    public Exam(int _correct, int _possible)
    {
        correct = _correct;
        possible = _possible;
    }

    public int correct { get; set; }
    public int possible { get; set; }
}

And here's the code to create the mess:

List<List<Student>> lstStudents = new List<List<Student>>();
List<Student> lstAStudents = new List<Student>();
List<Student> lstDuplicateAStudents = new List<Student>();
List<Student> lstBStudents = new List<Student>();

// Create a list of some A students
lstAStudents.Add(new Student("Alex", 14, new Exam(98,100)));
lstAStudents.Add(new Student("Kim", 13, new Exam(96, 100)));
lstAStudents.Add(new Student("Brian", 14, new Exam(92, 100)));
lstStudents.Add(lstAStudents);

// Create a duplicate list of A students
lstDuplicateAStudents.Add(new Student("Alex", 14, new Exam(98, 100)));
lstDuplicateAStudents.Add(new Student("Kim", 13, new Exam(96, 100)));
lstDuplicateAStudents.Add(new Student("Brian", 14, new Exam(92, 100)));
lstStudents.Add(lstDuplicateAStudents);

// Create a list of some B students
lstBStudents.Add(new Student("John", 13, new Exam(88, 100)));
lstBStudents.Add(new Student("Jenny", 13, new Exam(80, 100)));
lstBStudents.Add(new Student("Jamie", 15, new Exam(81, 100)));
lstStudents.Add(lstBStudents);
Brent Barbata
  • 3,631
  • 3
  • 24
  • 23
  • 1
    Use `Except` Linq method? Create a `Set` and cast it back to a `List` (which removes all duplicates, as `Set` cannot have duplicate members)? – Patashu Mar 27 '13 at 05:01
  • 1
    http://stackoverflow.com/questions/5969702/removing-duplicates-in-a-list-with-linq?rq=1 make sure you pick correct field to do group by on – BlackICE Mar 27 '13 at 05:02

2 Answers2

4

Probably you can hold a set which will accumulate unique lists:

var set = new HashSet<List<Student>>(new CustomComparer());
foreach (List<List<Student>> list in source)
{
  if (set.Contains(list))
    continue;
  set.Add(list)
}


public class CustomComparer : IEqualityComparer<List<Student>>
{
   public bool Equals(List<Student> one, List<Student> two)
   {
     if (one.Count != two.Count) return false;

     // simplest possible code to compare two lists
     // warning: runs in O(N*logN) for each compare
     return one.OrderBy(s=>s).SequenceEqual(two.OrderBy(s=>s));
   }

   public int GetHashCodeList<Student> item)
   {
     int ret = -1;
     foreach (var s in item)
       ret ^= s.GetHashCode();
     return ret;
   }
}

The main problem with this solution is the code which is used to compare two list<>. Are lists containing same elements in different order considered equal? If yes, we need to either change order by pre-sorting each list (to save time on compare), or sort each time a copy of each list, which will incur additional time penalty. So I guess the main question is how big are your lists. For values under 1000 students / 100 lists performance problems should not be noticeable.

Another problem is GetHashCode implementation - it is O(N), and we have nowhere to cache calculated value, since List is a framework structure. To work around this, I would suggest to introduce StudentList class, which will have comparer (for now we have to specify it externally) and get hash code with caching.

Also, there is a better implementation of generic collection equivalence comparer available.

Community
  • 1
  • 1
Alexander
  • 4,153
  • 1
  • 24
  • 37
  • thank you so much for the response. For my particular case, the order of the students in each list does mater. (I should have specified that since my example wasn't very good.) I've decided to mark Cuong Le's answer correct because it does (almost exactly) what I was looking for, but you helped me to better understand how one should approach this problem. Thank you again for your response. – Brent Barbata Mar 27 '13 at 23:02
1

You can use IEquatable<T> for both Student and Exam:

public class Student: IEquatable<Student>
{
    ...

    public bool Equals(Student other)
    {
        return name == other.name && age == other.age 
                    && lastExam.Equals(other.lastExam);
    }

    public override bool Equals(object obj)
    {
        Student student = obj as Student;
        return Equals(student);
    }

    public override int GetHashCode()
    {
        return name.GetHashCode() ^ 
             age.GetHashCode() ^ lastExam.GetHashCode();
    }
}

For Exam:

public class Exam: IEquatable<Exam>
{
    ...

    public bool Equals(Exam exam)
    {
        return exam.correct == correct && exam.possible == possible;
    }

    public override bool Equals(object obj)
    {
        Exam exam = obj as Exam;
        return Equals(exam);
    }

    public override int GetHashCode()
    {
        return correct.GetHashCode() ^ possible.GetHashCode();
    }
}

Then build an custom IQualityComparer<T> for List<Student>:

public class StudentListComparer : IEqualityComparer<List<Student>>
{
    public bool Equals(List<Student> x, List<Student> y)
    {
        return x.OrderBy(a => a.name)
                .SequenceEqual(y.OrderBy(b => b.name));
    }

    public int GetHashCode(List<Student> obj)
    {
        return obj.Aggregate(0, (current, t) => current ^ t.GetHashCode());
    }
}

Then you can Distinct to get the result:

var result = lstStudents.Distinct(new StudentListComparer());
cuongle
  • 74,024
  • 28
  • 151
  • 206
  • Thank you so much for taking the time to write out the solution. Since the order of the "students" mater, I just had to change one line in the StudentListComparer class to the following in order to keep lists which differed in order: return x.SequenceEqual(y); – Brent Barbata Mar 27 '13 at 23:07