-2

I have a following collection, it has more than 500000 items in it.

List<Item> MyCollection = new List<Item>();

and type:

class Item
{
   public string Name { get; set; }
   public string Description { get; set; }
}

I want to return a list of items having distinct Name. i.e. to find out distinct item based on name.

What are the possible ways & which would be best in terms of time & memory. Although both are important however less time has more priority over memory.

KV Prajapati
  • 93,659
  • 19
  • 148
  • 186
Atul Sureka
  • 3,085
  • 7
  • 39
  • 64
  • 2
    http://stackoverflow.com/a/5970996/1714342 – Kamil Budziewski Jul 24 '13 at 08:00
  • Does [`Enumerable.Distinct()`](http://msdn.microsoft.com/en-us/library/system.linq.enumerable.distinct.aspx) not do what you want? Or do you want a list of just the items that were unique in the list (which is different from what `Distinct()` does)? – Matthew Watson Jul 24 '13 at 08:03
  • possible duplicate of [Faster alternatives to .Distinct()](http://stackoverflow.com/questions/5970983/faster-alternatives-to-distinct) – George Duckett Aug 01 '13 at 11:00

6 Answers6

4

I would opt for Linq, unless or until the performance turns out to be insufficient:

var considered = from i in MyCollection
         group i by i.Name into g
         select new { Name = g.Key, Cnt = g.Count(), Instance = g.First() };
var result = from c in considered where c.Cnt == 1 select c.Instance;

(Assuming I've interpreted your question correctly as "return those items whose Name only appears once in the list")

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
2

i am having java version of the code

implement the comparator then define the method as below in Item class

public int compare(MyObject o1, MyObject o2)
{
   // return 0 if objects are equal in terms of your data members such as name or any
}

Then use the below code in the class in which MyCollection is defined

   HashSet<Item> set1 = new HashSet<Item>();
   set1.addAll(MyCollection);
   MyCollection.clear();
   MyCollection.addAll(set1);

This will give you the sorted set

Kalaiarasan Manimaran
  • 1,598
  • 1
  • 12
  • 18
1

You can sort your list an then delete all repeated items, But seems that storing all data in a Dictionary<string, string> would be better for this task. Or maybe even put all the list in a HashSet.

Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
Sergio
  • 6,900
  • 5
  • 31
  • 55
  • @lazyberezovsky why not? class item contains two string fields. `Name` could be key and `Description` is a value, just fits this case – Sergio Jul 24 '13 at 08:53
  • Actually there was problem with distinct items. Thus I thought you have several items with same name, and appropriate type would be `Dictionary>` (or Lookup). But if answer solved problem, then it's of course correct +1 – Sergey Berezovskiy Jul 24 '13 at 09:12
1

MoreLinq has a DistinctBy extension that is great for this sort of thing, its open source and just a few lines of code so easy to add to your code.

var results = MyCollection.DistinctBy(p => p.Name);
sa_ddam213
  • 42,848
  • 7
  • 101
  • 110
1

I can see you found your answer, but you can also do it fairly simply using Distinct;

internal class NameComparer : IEqualityComparer<Item> {
    public bool Equals(Item x, Item y) { return x.Name == y.Name;     }
    public int GetHashCode(Item obj) { return obj.Name.GetHashCode(); }
}

var distinctItems = MyCollection.Distinct(new NameComparer());
Joachim Isaksson
  • 176,943
  • 25
  • 281
  • 294
0

First solution:

public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> sequence, Func<T, TKey> keySelector)
{
    var alreadyUsed = new HashSet<TKey>();            
    foreach (var item in sequence)
    {
        var key = keySelector(item);
        if (alreadyUsed.Add(key))
        {
            yield return item;
        }
    }
}

Second is to use .Distinct() and override Equals in your item to match name

Kamil Budziewski
  • 22,699
  • 14
  • 85
  • 105