3

My object is in this form

List<SignUp>

class SignUp
{
  public int Id { get ; set;}
  public int VersionId { get ; set;}
  public int PersonId{ get ; set;}
  public DateTime? SignUpDate { get ; set;}
}

People signup to a version of a document. Some versions never get archived and they have to resign every year. so I end up with records like

SignUp s = new SignUp { Id = 1, VersionId = 1, PersonId = 5}
SignUp s2 = new SignUp { Id = 2, VersionId = 2, PersonId = 5}
SignUp s3 = new SignUp { Id = 3, VersionId = 1, PersonId = 5}

No this list which has s, s2, s3 has 2 duplicates on personId, versionId combination which are s & s3. only thing is s3 has a higher Id than s. Hence I want to eliminate s and just display s2, s3 (s is an older version and I ignore it)

How can this be achieved using a linq query if possible?

chugh97
  • 9,602
  • 25
  • 89
  • 136

3 Answers3

5

How about:

List<SignUp> signups = ...

var filteredSignups = from signup in signups
                      group signup by new { signup.PersonId, signup.VersionId }
                                      into pvIdGroup
                      select pvIdGroup.OrderBy(groupedSignUp => groupedSignUp.Id)
                                      .Last();

The idea is to group the items by the two properties and then pick the "best" item from each group.

If you don't want the inefficiency of sorting the items within each group, consider using an O(n) MaxBy method, such as the one from morelinq.

Then the select becomes:

select pvIdGroup.MaxBy(groupedSignUp => groupedSignUp.Id)
Ani
  • 111,048
  • 26
  • 262
  • 307
  • That should probably be `pvIdGroup.OrderBy(groupedSignUp => groupedSignUp.VersionID)` since he always wants the last/newest version. – arb Nov 10 '11 at 15:14
  • @Zero21xxx: No, look at the provided example again. – Ani Nov 10 '11 at 15:16
  • Well he should be using version instead of relying on the ID. That is probably what the version column is there for in the first place. But based on what he is saying, what you have is correct. Cool LINQ query. I wouldn't have thought of that myself. – arb Nov 10 '11 at 15:19
  • I'd personally go with `.OrderByDescending().First()`. – StriplingWarrior Nov 10 '11 at 15:46
  • I'm probably just more used to that pattern because `Last` doesn't always work with LINQ providers like LINQ to Entities. Depending on how it's implemented, though, there is a remote possibility that `First` is more performant than `Last`. – StriplingWarrior Nov 10 '11 at 15:50
1

Use DictinctBy from MoreLinq http://code.google.com/p/morelinq/

Tejo
  • 547
  • 4
  • 15
  • 1
    You can't use DistinctBy here as it provides no mechanism to pick the "best" item from within a "distinct" group. – Ani Nov 10 '11 at 15:22
0

You can do the following to get a new list of the SignUps with a unique combination of PersonID and VersionID.

        var list = new List<SignUp>(); ...

        List<SignUp> distinctSignUp = list
            .GroupBy(x => new {x.PersonId, x.VersionId} )
            .Select(y => y.Last())
            .ToList();

I'd like to thank user David B for his wonderful answer here: LINQ's Distinct() on a particular property

Community
  • 1
  • 1
Gaijinhunter
  • 14,587
  • 4
  • 51
  • 57
  • 1
    Selecting the Last item in the group isn't guaranteed to get the item with the highest ID. Ani's answer is more correct. – StriplingWarrior Nov 10 '11 at 15:47
  • Yeah, perhaps. I figured the last one entered would most likely be the latest, and Ids, depending on the implementation, may not be assigned in order. I guess it depends on what the OP meant by "older". – Gaijinhunter Nov 10 '11 at 15:53
  • The OP explicitly states that the item with the highest ID is to take precedence among duplicates, not the last item in the original collection. "s3 has a higher Id than s. Hence I want to eliminate s" – StriplingWarrior Nov 10 '11 at 15:55