Get random number of rows from SQL Server table

Question

I am trying to get 5 random number of rows from a large table (over 1 million rows) with a fast method.

So far what I have tested with these SQL queries:

Method 1

Select top 5 customer_id, customer_name 
from Customer TABLESAMPLE(1000 rows) 
order by newid()

This method estimated I/O cost is 0.0127546 so this is very fast (index scan nonclustered)

Method 2

select top 5 customer_id, customer_name 
from Customer 
order by newid()

This method's sort estimated I/O cost is 117.21189 and index scan nonclustered estimated I/O cost is 2.8735, so this is affecting performance

Method 3

select top 5 customer_id, customer_name 
from Customer 
order by rand(checksum(*))

This method's sort estimated I/O cost is 117.212 and index scan nonclustered estimated I/O cost is 213.149, this query is slower than all because estimated subtree cost is 213.228 so it's very slow.

UPDATE:

Method 4

select top 5 customer_id, customer_name, product_id
from Customer 
Join Product on product_id = product_id
where (customer_active = 'TRUE')
order by checksum(newid())

This approach is better and very fast. All the benchmark testing is fine.

QUESTION

How can I convert Method 4 to LINQ-to-SQL? Thanks

Are you asking about improve the random query performance or convert the query to LINQ? — Juan Carlos Oropeza, May 08 '17 at 14:21
How random do you want the result to be? `TABLESAMPLE` is not really random (unless your rows are so large that only one fits on a page). — Gordon Linoff, May 08 '17 at 14:22
@JuanCarlosOropeza Yes first I am looking to convert Method 1 to LINQ if not possible then I am looking for better approach. — aadi1295, May 08 '17 at 14:26
@GordonLinoff: Actually client just want to fill up the space so I just need 5 random rows to show without effecting the performance. — aadi1295, May 08 '17 at 14:27
@JuanCarlosOropeza Well he want to show random records even very old records or each page refresh not the latest. — aadi1295, May 08 '17 at 14:29
Well as gordon say, Method 1 isnt really random, so I would go for Method 2. And I woudlnt use LINQ for this. I rather create a function on db and then use EF to call that function to retrive the 5 records. — Juan Carlos Oropeza, May 08 '17 at 14:34

score 3 · Accepted Answer · edited May 08 '17 at 19:17

If you want to convert Method 2 into Linq To Entities just use the solution answered by jitender which look like this:

var randomCoustmers = context.Customers.OrderBy(x => Guid.NewGuid()).Take(5);

But for Method 1 which is very fast following your benchmarking, you need to do the following C# code because Linq To Entities doesn't have a LINQ equivalent for this SQL statement TABLESAMPLE(1000 rows).

var randomCoustmers = context.Customers.SqlQuery("Select TOP 5 customer_id, customer_name from Customer TABLESAMPLE(1000 rows) order by newid()").ToList();

You can move the SQL statements into a SQL View or Stored Procedure which will receive the number of customers to take.

UPDATE

For Method 4 which seems to be very fast (always by following your benchmark), you can do the following Linq To Entities:

var randomCoustmers = context.Customers.OrderBy(c => SqlFunctions.Checksum(Guid.NewGuid()).Take(5);

Entity Framework can translate into SQL all functions that are defined into SqlFunctions class. In those functions we have Checksum function which will do what you want.

If you want to join with other tables you can do it without difficulty with Linq To Entites so I just simplified my version by querying only the Customers DbSets.

I am gonna test it, meanwhile I have updated the question, can you please check. Thanks — aadi1295, May 08 '17 at 15:10
Thanks so much. Method 4 with SqlFunctions working like charm. Very smooth and no load on the server. Thanks again :) — aadi1295, May 08 '17 at 16:34

score 0 · Answer 2 · edited May 23 '17 at 11:47

0

As stated Here's the best way:

var randomCoustmers = Customers.OrderBy(x => Guid.NewGuid()).Take(5);

edited May 23 '17 at 11:47

Community

1
1

answered May 08 '17 at 14:37

jitender

10,238
1
18
44

2

In this case is better use the option close with `duplicated question` – Juan Carlos Oropeza May 08 '17 at 14:41
You convert **Method 2** not **Method 1** as expected by the OP. – CodeNotFound May 08 '17 at 14:42
Actually I am already using it, but cost of I/O is very high so it's effecting the performance. **Method 2** is the same. I am looking for better approach. – aadi1295 May 08 '17 at 14:44
@JuanCarlosOropeza I was thinking in the same way but I don't have rights to mark it as duplicate :-) – jitender May 08 '17 at 14:49
@ArbazAbid Then you need to rewrite the question. So either you want help to convert the query to LINQ or improve performance. – Juan Carlos Oropeza May 08 '17 at 14:51
Sorry jitender, forgot you need 3000 rep to unlock that privilege. – Juan Carlos Oropeza May 08 '17 at 14:55
@JuanCarlosOropeza no problem bro – jitender May 08 '17 at 15:02
@JuanCarlosOropeza I have updated the question, please check. Thanks – aadi1295 May 08 '17 at 15:10
Again you are asking two different question. Decide which one you want help with. If you already think Method 4 is the right one, then perfomance details are irrelevant to answer the question. – Juan Carlos Oropeza May 08 '17 at 15:13
@JuanCarlosOropeza Ok sorry, my bad. I just now need to convert Method 4 to LINQ to SQL. Thanks – aadi1295 May 08 '17 at 15:16
1

Then I suggest you create a new question and ask just that. – Juan Carlos Oropeza May 08 '17 at 15:19
@JuanCarlosOropeza I have posted a new a question [http://stackoverflow.com/questions/43852273/convert-sql-query-to-linq-to-sql](http://stackoverflow.com/questions/43852273/convert-sql-query-to-linq-to-sql) Please check it, thanks – aadi1295 May 08 '17 at 16:03

Get random number of rows from SQL Server table

2 Answers2

Linked