2

I have a list of duplicate names and I want to get the list without the duplicates.

 CSVCategories = from line in File.ReadAllLines(path).Skip(1)
                            let columns = line.Split(',')
                            select new Category
                            {
                                Name = columns[9]
                            };

            var results = CSVCategories.GroupBy(x => x.Name)
                         .Select(g => g.FirstOrDefault())
                         .ToList();

I try to look at the elements and debug using the following loop, but it still returns the duplicates from the list including empty strings for null values:

foreach(var item in results)
{
    Console.WriteLine(item.Name);
}
naz786
  • 485
  • 1
  • 5
  • 22
  • 2
    Related posts: http://stackoverflow.com/questions/1606679/remove-duplicates-in-the-list-using-linq or http://stackoverflow.com/questions/37850167/delete-duplicates-in-a-list-of-int-arrays/37850231#37850231 – Salah Akbari Feb 18 '17 at 14:24

3 Answers3

1

Calling Distinct does not work most likely because your Category class does not have proper implementation of Equals and GetHashCode.

You have two options. Properly overwrite Equals and GetHashCode methods, or use Hashset to check if Name is not already added.

var uniqueNames = new Hashset<string>(); 

// Original select statement

CSVCategories = CSVCategories.Where(x => uniqueName.Add(x.Name)).ToList();
Max Venediktov
  • 382
  • 2
  • 8
0

Linq encourages immutability so it never modifies your input collection. So Distinct() returns a new collection rather modified the collection inline. Try:

foreach(var item in CSVCategories.Distinct())
    {
        Console.WriteLine(item.Name);
    }
dragonfly02
  • 3,403
  • 32
  • 55
  • The foreach loop was just for debugging purposes. So do you think I should add the items to a new list within the loop to get a distinct list? – naz786 Feb 18 '17 at 14:48
  • Yes. If you assign the result of Distinct() to a variable (new one or an existing one) you'll get a unique collection. No need to call ToList() before calling Distinct() – dragonfly02 Feb 18 '17 at 14:50
0

I noticed that the results variable brought me back a list containing duplicates, but only that were different in their casing.

E.g. My original list CSVCategories contained the elements: ["Home", "home", "EmptyString", "home", "Town", "Town", "Park"]

When de-duplicating with GroupBy, the results query returned ["Home", "home", "EmptyString", "Town", "Park"], so it kind of worked. Keeping values that are empty and those that have a different casing.

Now I need to find a way to remove casing duplicates and empty strings.

naz786
  • 485
  • 1
  • 5
  • 22