3

I have a fairly complex LINQ query that joins several tables, and selects a new anonymous type that is three IEnumerable's {Users, JobProviders, Jobs}. It returns an IQueryable to maintain deferred execution, which eliminates DistintBy from this question.

One of the columns is a rank, and I need to make sure that only the record with the lowest rank for each job (another column, many jobs will get selected) gets selected. Distinct doesn't work because the rank will obviously make the row unique.

I thought the group clause might help this, but it changes the return type to IGrouping. I don't fully understand how group works, so I may be wrong, but it looks it wont work. Is there any way to say for each job, take only the lowest rank?

something like

let jobRank = JobProvider.Rank
...where min(rank)
Community
  • 1
  • 1
Kyeotic
  • 19,697
  • 10
  • 71
  • 128
  • 1
    `Distinct` takes an `IComparer` as an optional parameter. Provide your own which doesn't incorporate `Rank` in the `CompareTo` method. – Servy Apr 13 '12 at 20:13
  • I thought that could not be done in L2E with deferred execution because it cant translate it into TSQL. Is this not the case? – Kyeotic Apr 13 '12 at 20:29

2 Answers2

5

You can use grouping, as much as it makes me cringe to use groupBy to do a distinct. You can just call First on the IGrouping to get one item out of the group, which is in effect a distinct. It will look something like this:

var distinctItems = data.GroupBy(item => new{
  //include all of the properties that you want to 
  //affect the distinct-ness of the query
  item.Property1
  item.Property2
  item.Property3
})
.Select(group => group.Key);
//if it's important that you have the low rank use the one below.
// if you don't care use the line above
//.Select(group => group.Min(item => item.Rank));
Servy
  • 202,030
  • 26
  • 332
  • 449
  • I'm pretty sure this does not translate to SQL. It will silently run as a linq-to-objects query. If it does, performance is very bad. – usr Apr 13 '12 at 21:10
  • @usr Why won't the translate to SQL? If you try it and it can't, I'll believe you, but I know that it's at least conceivable that it will work (depending on the provider). The groupby isn't something that worries me. The `select->first` is a bit more involved, so it may have a harder time translating it. Even if that's the case, the harder work is the groupby. – Servy Apr 13 '12 at 21:14
  • Why does this make you cringe, it looks like a clean solution. Especially considering the alternative of using a custom IEqualityComparer requires an entire new class – Kyeotic Apr 13 '12 at 21:14
  • @Tyrsius Because a distinct can throw out an item as soon as it knows it's "equal" to another, a group by needs to keep it around just so that it can be thrown out later. It noticeably increases the total memory footprint, and marginally damages performance. – Servy Apr 13 '12 at 21:15
  • Yeah I meant the First(). L2S is quite weak at First/Single/OrDefault stuff. I haven't figured out all the rules yet when it can and when it can't. The nasty thing is that this will not break but execute on the client. – usr Apr 13 '12 at 21:17
  • Btw, in SQL Server distinct translates internally to a group by *. No difference at all. – usr Apr 13 '12 at 21:17
  • @usr Isn't the SQL just `TOP`? You have to nest the queries around a bit, so it's not trivial, but it's certainly possible. Anytime you're dealing with query providers and developing queries it's best to log the actual request to the database and ensure that you're doing as much as you expect on the DB, and in a way that's not horribly inefficient. Even the most innocent looking LINQ queries can end up as lint to objects if you never pay attention. – Servy Apr 13 '12 at 21:20
  • So the First() will cause it to execute? If so, this wont work. The deffered execution is critical, because this returns to something that will produce more filtering. – Kyeotic Apr 13 '12 at 21:21
  • It will not cause it to execute in any case. But the query might execute partially locally if you eventually execute it (e.g. using ToList). – usr Apr 13 '12 at 21:25
  • @servy This actually threw an error `"The method 'First' can only be used as a final query operation. Consider using the method 'FirstOrDefault' in this instance instead."` – Kyeotic Apr 13 '12 at 21:50
  • @Tyrsius And did you try `FirstOrDefault`? – Servy Apr 13 '12 at 22:36
  • Do you need any properties that you aren't grouping on as results in your query? If not then you can just select the key (updating my answer accordingly). – Servy Apr 13 '12 at 23:55
  • @Servy, I do need properties I was not grouping by, that would probably be it. The key method is good to know though – Kyeotic Apr 16 '12 at 16:06
2

Good solution here:

LINQ's Distinct() on a particular property

What you need is a "distinct-by" effectively. I don't believe it's part of LINQ as it stands, although it's fairly easy to write:

What you need is a "distinct-by" effectively. I don't believe it's part of LINQ as it stands, although it's fairly easy to write:

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}
Community
  • 1
  • 1
Tom
  • 29
  • 1