3

I have a list:

List<Test> = new List<Test>{
new Test{Name="Test", Date="2016-06-13 18:32:01.380"},
new Test{Name="Test2", Date="2016-06-13 18:32:29.117"},
new Test{Name="Test3", Date="2016-06-13 18:32:40.930"},
new Test{Name="Test3", Date="2016-06-13 18:32:51.517"},
new Test{Name="Test", Date="2016-06-13 18:33:06.477"},
.....
}

How can I remove items with duplicate Name values, keeping the only the item with the most recent Date value while achieving optimal performance?

Jay
  • 56,361
  • 10
  • 99
  • 123
  • This is not a duplicate of the linked question, which deals with simple values where "duplicate" means equality. – Jay Jun 13 '16 at 11:40
  • Only the `duplicate question` does not preserve the last Date as requested. So it's not just. put a distinct or via a hashset. Don't click too fast on duplicate... – Jeroen van Langen Jun 13 '16 at 11:40

2 Answers2

4

This is at least the most readable approach and presumes that Date is actually a DateTime:

tests = tests.GroupBy(t => t.Name)
    .Select(g => g.OrderByDescending(t => t.Date).First())
    .ToList();

This is more efficient:

var latestTests = new Dictionary<string, Test>(tests.Count);
foreach (Test t in tests)
{
    Test test;
    if (latestTests.TryGetValue(t.Name, out test))
    {
        if(test.Date < t.Date)
            latestTests[t.Name] = t;
    }
    else
    {
        latestTests.Add(t.Name, t);
    }
}
tests = latestTests.Values.ToList();
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Actually I think that GroupBy() could be pretty efficient. – Matthew Watson Jun 13 '16 at 11:47
  • i have a list with ~ 1 mil records, so i think GroupBy not good perfomance !? – Trường Sơn Jun 13 '16 at 11:56
  • @TrườngSơn `GroupBy()` uses a dictionary (or similar), so it wouldn't be too bad. It should be O(N) – Matthew Watson Jun 13 '16 at 12:00
  • @TrườngSơn: in general it's performance is ok since it's also using a set, it needs a little bit more memory. But the ordering takes also some time. So in general the dictionary approach will be faster. Maybe you don't even need the final `ToList` if you could use the dictionary instead. Do you need to access by index or by name? If you only want to enumerate the latest tests you could also do that via `foreach(Test t in latestTests.Values)...` – Tim Schmelter Jun 13 '16 at 12:01
3

I think the solution suggested by Tim is fine. (first one) You should follow the KISS principle.

But......

You could create a `Dictionary' for it and lookup each item. I think this will be the most efficient. this one does only one lookup.

foreach(var searchItem in myList)
{
    Test item;
    if(myDict.TryGetValue(searchItem.Name, out item))
    {
        if(searchItem.Date > item.Date)
        {
            // swap the dates to keep the original objects intact (but this will change the order in the list.)
            var temp = item.Date;
            item.Date = searchItem.Date;
            searchItem.Date = temp;
        }
    }
    else
        // create a copy, you don't want to change the original
        myDict.Add(
            searchItem.Name, 
            searchItem);
}

You might compare these results... groupby vs dictionary

Jeroen van Langen
  • 21,446
  • 3
  • 42
  • 57