5

I want to select a handful of random rows from the results of a complex query on a very large table (many millions of rows).

I am using SQL Server 2008, and the proper way to do this efficiently seems to be the TABLESAMPLE clause.

Note 1: I am not interested in the popular "order by NEWID()" solution - it is inefficient for large tables.

Note 2: Since my query is complex, I do not want to have to first calculate the COUNT over it, if possible.

Note 3: Since the resultset is huge, I do not want to have to traverse it myself, such as is suggested here.

The kicker is that I am using LINQ. Specifically, LINQ-To-Entities.

Is there a LINQ-friendly way to use TABLESAMPLE?

Even if there is no direct support, is there some way I can write most of my query in LINQ and then do a small amount of manual SQL to perform the TABLESAMPLE?

Community
  • 1
  • 1
jwd
  • 10,837
  • 3
  • 43
  • 67

4 Answers4

0

Not a direct answer to your question but you can use this technique to select a random percentage sample of individual rows. The following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:

SELECT * FROM Sales.SalesOrderDetail   
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)

Possibly of interest: T-SQL: Generating Random Numbers, Random Sampling and Random ‘Goodness’

Mitch Wheat
  • 295,962
  • 43
  • 465
  • 541
  • Ah yes, seems similar to this link: http://msdn.microsoft.com/en-us/library/cc441928.aspx If you have a way of doing that in a LINQ-friendly way, I am curious to know that as well. – jwd Sep 09 '11 at 00:06
0

Something like this should work (syntax may not be exactly right but you should get the idea):

var rowCount = context.MyTable.Count();

int randomInt = new Random().Next(rowCount);    
var query = context.MyTable.Skip(randomInt).FirstOrDefault();
Randy Minder
  • 47,200
  • 49
  • 204
  • 358
  • A couple problems I have with this approach: (1) Only gets one row per query [I want to get several rows in one shot, if possible] (2) Need to calculate the count before running my (complex!) query. I will update the question to note that the query is complex. – jwd Sep 09 '11 at 00:09
0

With EF, you could create a defining query in your model using the tablesample construct. See http://msdn.microsoft.com/en-us/library/cc982038.aspx. Alternatively, you could create a randomized view in your database and then include that view in your model.

Jim Wooley
  • 10,169
  • 1
  • 25
  • 43
  • TABLESAMPLE cannot be used in view definitions. There was a good comment thread about this, but the question was removed ): – jwd Sep 09 '11 at 17:54
  • (I mean the *answer* was removed :) – jwd Sep 09 '11 at 18:01
  • @jwd thanks for the correction. Defining query could still be a possibility since it is pure SQL rather than a saved view. – Jim Wooley Sep 09 '11 at 18:29
0

It seems that what I want to accomplish is not even possible in the first place.

TABLESAMPLE cannot be used on derived tables, so it is not even feasible to have a complex query generating a large result set and then get a random sampling with TABLESAMPLE.

TABLESAMPLE is only something that can be used on the base tables that go into a query, before joins and soforth. (see documentation)

This MSDN link describes a way to get a random percentage of results efficiently, so the best way to do approximately what I want may be to use that in a view, and build my LINQ off of that view.

Thank you all for the input.

jwd
  • 10,837
  • 3
  • 43
  • 67