6

I have to select distinct 1000 records using LINQ. But when I see the query generated it takes 1000 records and applies distinct over that result.

IQueryable<TestClass> resultSet = (from w in ......).Distinct().Take(1000);

where my TestClass would be like,

public TestClass
{
public string TestPRop { get; set; }
 //..has some 20 properties
}

Any way to solve this to get distinct applied to the resultset and then take 1000 from the distinct result set?

DeanOC
  • 7,142
  • 6
  • 42
  • 56
Arvind897
  • 101
  • 1
  • 8
  • Did you try to add `Skip(0)`? Maybe this will create an "offset fetch next" query and not a top n. – Bjørn-Roger Kringsjå Sep 02 '15 at 18:05
  • 2
    Have you tried using a group operator instead of distinct to see if this affects the order of operations? – stephen.vakil Sep 02 '15 at 18:09
  • 1
    Try `var subquery = (from w in ...).Distinct();` and then `result = (from s in subquery where 1==1 select s).Take(1000);`. to see if that forces the distinct to be part of a subquery. – juharr Sep 02 '15 at 18:09
  • base on what criteria your query should be distinict? – Arash Sep 02 '15 at 18:14
  • 1
    take a look at jon skeet answer http://stackoverflow.com/a/1011014/1876572 – Eldho Sep 02 '15 at 18:38
  • 1
    please do `resultSet.ToString()` and paste the result as a edit to your question. That will show us what query it is generating. – Scott Chamberlain Sep 02 '15 at 18:40
  • Does `Distinct` have any effect at all? Maybe the query `from w in ...` only returns unique results. – Gert Arnold Sep 02 '15 at 19:04
  • What's the source here? This should perform distinct first, but a given query provider could have a bug. – Jon Hanna Sep 03 '15 at 00:37
  • @ScottChamberlain sorry i dnt have the exact query with me now. It got generated in such a way that "Limit" got applied to all columns which i wanted to get.. with distinct before the whole query.. – Arvind897 Sep 03 '15 at 01:45
  • @GertArnold nope it may return duplicate records since the DB has one to many reltnshp plus, i want to do a distinct only comparing with prperties of my TestClass type. The resultset from DB should be a distinct set of values based on the all the columns available in my tables. – Arvind897 Sep 03 '15 at 01:48
  • @JonHanna the source is Oracle DB which i connect via the entity fw 5.0 ..it contains duplicates because of the reason mentioned in the above comment. – Arvind897 Sep 03 '15 at 01:54
  • What SQL is being produced for the query? – Jon Hanna Sep 03 '15 at 09:03
  • Looks like it's a bug in the Oracle query provider. Which provider is it? – Gert Arnold Sep 03 '15 at 10:19
  • Have you tried using the "group" option? I have updated my answer below to provide an example. – joelnet Sep 04 '15 at 18:26

3 Answers3

1

Distinct will be processed before the take. Double check your Distinct is performing correctly. Here's a working example:

var dataset = new int[]
{
    1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10
};

var query = from o in dataset select o;
var result = query.Distinct().Take(6);

// result = `1,2,3,4,5,6`

I suspect your issue is with using distinct with SQL. If that is the case, you can also use grouping to get the result you want.

var distinctById = from o in query
                   group o by o.Id into uniqueIds
                   select uniqueIds.FirstOrDefault();
joelnet
  • 13,621
  • 5
  • 35
  • 49
0

You are only taking 1000 records and then applying distinct over those 1000 records with you code even though you are doing distinct() before Take().

The best way to accomplish this is to get the 1000 distinct records directly from your SQL query rather than from the code I.E

string queryString = "SELECT top 1000 from (select distinct ... from table)";
Jack89515
  • 69
  • 8
-1

you can use nested linq queries

IQueryable<TestClass> resultSet = from x in ((from w in ......).Distinct()).Take(1000);
Viru
  • 2,228
  • 2
  • 17
  • 28
  • 2
    That won't change the generated query in any way. This will compile down to an identical expression tree before it even gets to the query provider. – Servy Sep 02 '15 at 18:20