5

I always seem to have a problem when I need to compare 2 list and produce a 3rd list which include all unique items.I need to perform this quite often.

Attempt to reproduce the issue with a noddy example.

Am I missing something? Thanks for any suggestions

The wanted result

   Name= Jo1 Surname= Bloggs1 Category= Account
   Name= Jo2 Surname= Bloggs2 Category= Sales
   Name= Jo5 Surname= Bloggs5 Category= Development
   Name= Jo6 Surname= Bloggs6 Category= Management
   Name= Jo8 Surname= Bloggs8 Category= HR
   Name= Jo7 Surname= Bloggs7 Category= Cleaning

class Program
{
    static void Main(string[] args)
    {
          List<Customer> listOne = new List<Customer>();
        List<Customer> listTwo = new List<Customer>();

        listOne.Add(new Customer { Category = "Account", Name = "Jo1", Surname = "Bloggs1" });
        listOne.Add(new Customer { Category = "Sales", Name = "Jo2", Surname = "Bloggs2" });
        listOne.Add(new Customer { Category = "Development", Name = "Jo5", Surname = "Bloggs5" });
        listOne.Add(new Customer { Category = "Management", Name = "Jo6", Surname = "Bloggs6" });



        listTwo.Add(new Customer { Category = "HR", Name = "Jo8", Surname = "Bloggs8" });
        listTwo.Add(new Customer { Category = "Sales", Name = "Jo2", Surname = "Bloggs2" });
        listTwo.Add(new Customer { Category = "Management", Name = "Jo6", Surname = "Bloggs6" });
        listTwo.Add(new Customer { Category = "Development", Name = "Jo5", Surname = "Bloggs5" });
        listTwo.Add(new Customer { Category = "Cleaning", Name = "Jo7", Surname = "Bloggs7" });


    List<Customer> resultList = listOne.Union(listTwo).ToList();//**I get duplicates why????**

        resultList.ForEach(customer => Console.WriteLine("Name= {0} Surname= {1} Category= {2}", customer.Name, customer.Surname, customer.Category));
        Console.Read();

        IEnumerable<Customer> resultList3 = listOne.Except(listTwo);//**Does not work**

        foreach (var customer in resultList3)
        {
            Console.WriteLine("Name= {0} Surname= {1} Category= {2}", customer.Name, customer.Surname, customer.Category);
        }

        **//Does not work**
        var resultList2 = (listOne
                       .Where(n => !(listTwo
                           .Select(o => o.Category))
                           .Contains(n.Category)))
                       .OrderBy(n => n.Category);

        foreach (var customer in resultList2)
        {
            Console.WriteLine("Name= {0} 
                             Surname= {1} 
                             Category= {2}", 

customer.Name, customer.Surname, customer.Category); } Console.Read();

  }
}

public class Customer
{
    public string Name { get; set; }
    public string Surname { get; set; }
    public string Category { get; set; }
}
user9969
  • 15,632
  • 39
  • 107
  • 175

4 Answers4

18

Couldn't you do this by using the Concat and Distinct LINQ methods?

List<Customer> listOne;
List<Customer> listTwo;

List<Customer> uniqueList = listOne.Concat(listTwo).Distinct().ToList(); 

If necessary, you can use the Distinct() overload that takes an IEqualityComparer to create custom equality comparisons

cordialgerm
  • 8,403
  • 5
  • 31
  • 47
  • 2
    Thanks for your reply. Is that the same as List resultList = listOne.Union(listTwo).ToList(); I still get duplicates though ,may be is something wrong with my customers examples – user9969 Sep 10 '10 at 06:27
  • 1
    I know this post is old, but for anyone facing the same problem: Your customers might be two different objects with the same name. If you want to compare them by name (or any other property) you should use a comparer as described in this [post](http://stackoverflow.com/questions/4607485/linq-distinct-use-delegate-for-equality-comparer) – Bahamut Apr 07 '14 at 07:47
  • No. Apart from equality that needs to be defined explicitly, as in the accepted answer, if `listTwo` contains customers also in `listOne` the result won't be unique. A [simple union](https://stackoverflow.com/a/60297214/861716) is enough, because Union is an implicit Distinct. – Gert Arnold Jun 26 '22 at 12:52
15

The crux of the problem is the Customer object doesn't have a .Equals() implementation. If you override .Equals (and .GetHashCode) then .Distinct would use it to eliminate duplicates. If you don't own the Customer implementation, however, adding .Equals may not be an option.

An alternative is to pass a custom IEqualityComparer to .Distinct(). This lets you compare objects in different ways depending on which comparer you pass in.

Another alternative is to GroupBy the fields that are important and take any item from the group (since the GroupBy acts as .Equals in this case). This requires the least code to be written.

e.g.

    var result = listOne.Concat(listTwo)
        .GroupBy(x=>x.Category+"|"+x.Name+"|"+x.Surname)
        .Select(x=>x.First());

which gets your desired result.

As a rule I use a unique delimiter to combine fields so that two items that should be different don't unexpectedly combine to the same key. consider: {Name=abe, Surname=long} and {Name=abel, Surname=ong} would both get the GroupBy key "abelong" if a delimiter isn't used.

Herohtar
  • 5,347
  • 4
  • 31
  • 41
Handcraftsman
  • 6,863
  • 2
  • 40
  • 33
1

The best option is implement the interface IEqualityComparer and use it within Union or Distinct method as I wrote at the end of this article http://blog.santiagoporras.com/combinar-listas-sin-duplicados-linq/

  • Implementation of IEqualityComparer
public class SaintComparer : IEqualityComparer<Saint>
{
    public bool Equals(Saint item1, Saint item2)
    {
       return item1.Name == item2.Name;
    }
     
    public int GetHashCode(Saint item)
    {
      int hCode = item.Name.Length;
      return hCode.GetHashCode();
    }
}
  • Use of comparer
var unionList = list1.Union(list2, new SaintComparer());

bh_earth0
  • 2,537
  • 22
  • 24
  • To explain why this is all that's needed: `Union` is an implicit `Distinct`. This should be the accepted answer. – Gert Arnold Jun 26 '22 at 12:53
0

I had a similar problem where I had two very large lists with random strings.

I made a recursive function which returns a new list with unique strings. I compared two lists with 100k random strings(it may or may not exist duplicates) each with 6 characters of abcdefghijklmnopqrstuvwxyz1234567890 and it was done in about 230 ms. I only measured the given function.

I hope this will give value to someone.

Image of test run

makeCodesUnique(List<string> existing, List<string> newL)
{
    // Get all duplicate between two lists
    List<string> duplicatesBetween = newL.Intersect(existing).ToList();

    // Get all duplicates within list
    List<string> duplicatesWithin = newL.GroupBy(x => x)
    .Where(group => group.Count() > 1)
    .Select(group => group.Key).ToList();

    if (duplicatesBetween.Count == 0 && duplicatesWithin.Count == 0)
    {
        // Return list if there are no duplicates
        return newL; 
    }
    else
    {
        if (duplicatesBetween.Count != 0)
        {
            foreach (string duplicateCode in duplicatesBetween)
            {
                newL.Remove(duplicateCode);
            }

            // Generate new codes to substitute the removed ones
            List<string> newCodes = generateSomeMore(duplicatesBetween.Count);
            newL.AddRange(newCodes);
            makeCodesUnique(existing, newL);
        }
        else if (duplicatesWithin.Count != 0)
        {
            foreach (string duplicateCode in duplicatesWithin)
            {
                newL.Remove(duplicateCode);
            }
            List<string> newCodes = generateSomeMore(duplicatesWithin.Count);
            new.AddRange(newCodes);
            makeCodesUnique(existing, newL);
        }
    }
    return newL;
}
Erik Nguyen
  • 346
  • 2
  • 6