3

I have a List of Document objects. The Document class has many properties but only two are relevant here, DocumentLinkId and UploadedOnDate.

What I want to do is filter the list down so there are no two Document objects with the same DocumentLinkId. When there is more than one Document object with a particular DocumentLinkId I want to keep the one with the latest UploadedOnDate.

My initial inclination was to do something like this:

myDocumentsList.Distinct(d => d.DocumentLinkId).Max(d => d.UploadedOnDate);

But Distinct() doesn't take a predicate. Is there a way to do this with LINQ?

Legion
  • 3,922
  • 8
  • 51
  • 95
  • Try grouping by `DocumentLinkId` and then taking the one in each group with the max `UploadedOnDate` – Nkosi Jun 28 '16 at 19:47
  • There is always an (old school) option of writing a for loop with a Dictionary as a cache. It will probably work the fastest and be easier to debug if anything goes wrong, or you want to extend search criteria and the like. – Victor Zakharov Jun 28 '16 at 20:21

3 Answers3

8

You can group the documents by DocumentLinkId, and for each group, select the item with the latest UploadedOnDate like this:

var result = myDocumentsList
    .GroupBy(d => d.DocumentLinkId)
    .Select(g => g.OrderByDescending(d => d.UploadedOnDate).First())
    .ToList();
Yacoub Massad
  • 27,509
  • 2
  • 36
  • 62
1

You can use DistinctBy like in this question.

var query = people.DistinctBy(p => p.Id);

It will be something like:

myDocumentsList.OrderByDescending(x => x.UploadedOnDate).ToList().DistinctBy(d => d.DocumentLinkId).Max(d => d.UploadedOnDate);

for your case.

Community
  • 1
  • 1
meJustAndrew
  • 6,011
  • 8
  • 50
  • 76
  • This will first select a set of documents with distinct `DocumentLinkId`. when it selects them it may or may not choose the ones with the most recent `UploadedOnDate`. Then, having created that set, it will select just the one document with the highest `UploadedOnDate`. So it doesn't return a distinct set. It returns one object. – Scott Hannen Jun 29 '16 at 02:24
  • You are right @ScottHannen I have updated my answer to select the entries in the list, first ordered by UploadedOnDate, and then to select them distinct by DocumentLinkId – meJustAndrew Jun 29 '16 at 17:15
1

You can define an implementation of IEqualityComparer<Document>. It exists for pretty much exactly this scenario.

public class DocumentLinkIdDocumentEqualityComparer : IEqualityComparer<Document>
{
    public bool Equals(Document document1, Document document2)
    {
        return document1.DocumentLinkId == document2.DocumentLinkId;
    }
}

Then you can do this:

myDocumentsList.OrderByDescending(d => d.UploadedOnDate)
    .Distinct(new DocumentLinkIdDocumentEqualityComparer())

(Had to edit this to order it first so that distinct returns the one with the most recent date.)

You're saying that for the purpose of this one Distinct comparison, let's use this comparer and act as if any two documents with the same DocumentLinkId are equal.

What's nice about that is that you don't have to modify Document to override Equals, especially since this particular equality comparison might not apply in every case. This lets you specify when you want to use a particular custom equality comparison.

Scott Hannen
  • 27,588
  • 3
  • 45
  • 62