0

I have a table that has more than 1 million records and I want to select random rows from this table, but not among all records - only select Random rows from results matching certain conditions.

Performance is very important, so I can NOT use ordering by NEWID and then select first item.

The table structure is some thing like this:

 ID    BIGINT
 Title NVARCHAR(100)
 Level INT
 Point INT

Now, I wrote a query like:

with 
    tmp_one as
    (
        SELECT
                R.Id as RID 
                FROM    [User] as U
                            Inner Join
                        [Item] as R
                            On  R.UserId = U.Id

                WHERE       ([R].[Level] BETWEEN @MinLevel AND @MaxLevel) 
                        AND ((ABS((BINARY_CHECKSUM(NEWID(),R.Id,NEWID())))% 10000)/100 ) > @RangeOne
    ),
    tmp_two as
    (
        Select  tmp_one.RID as RID
            From    tmp_one
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),RID,NEWID())))% 10000)/100 ) > @RangeTwo
    ),
    tmp_three as
    (
        Select  RID as RID 
            From    tmp_two
            Where   ((ABS((BINARY_CHECKSUM(NEWID(),NEWID())))% 10000)/100 ) < @RangeThree
    )
    Select  top 10 RID
        From    tmp_three

I tried to select 10 item randomly, and then select one of them, but I have an amazing problem!!!

Sometimes the output is ordered by item level! And I don't want it (it's not really random ). I really don't know how result was ordered by level.

Please suggest some solution that help me to select random record in high performance and random selected in high range of iteration is not duplicate.

Raj More
  • 47,048
  • 33
  • 131
  • 198
Behrooz
  • 2,437
  • 3
  • 21
  • 31
  • 3
    You have TOP without ORDER BY. So you are telling SQL Server "I don't care about order!" Therefore, SQL Server is returning the data to you in the order that *it* deems most efficient. – Aaron Bertrand Oct 27 '14 at 20:30
  • okey, how can i prevent this efficient order and tell that just use default order and DO NOT change it? (insert order) and why this do not happen in regular select and select show data in insert order? – Behrooz Oct 27 '14 at 20:41
  • 1
    Please click and read #3 here: [SQL Server Assumptions](http://blogs.sqlsentry.com/aaronbertrand/t-sql-tuesday-56-sql-server-assumptions/) – Aaron Bertrand Oct 27 '14 at 20:52
  • Please read the related question http://stackoverflow.com/questions/7230772/tsql-random-select-with-selective-criteria?rq=1 – Turophile Oct 27 '14 at 21:33

2 Answers2

1

Based from MSDN's Selecting Rows Randomly from a Large Table, instead of the one you avoid:

select top 10 * from TableName order by newid()

It suggests this:

select top 10 * from TableName where (abs(cast((binary_checksum(*) * rand()) as int)) % 100) < 10

It has only much smaller logical read an much better performance.

Irawan Soetomo
  • 1,315
  • 14
  • 35
-1

Try something like this. It will randomly grab 10 rows from your table.

This is pseudo code, so you might need to fix a few column names to match your real tables.

DECLARE @Random int
DECLARE @Result table
(ID BIGINT,
Title varchar(100),
Level int,
Point int)

declare @TotalRows int
set @TotalRows = (select COUNT(*) From [User] U inner join [Item] R on R.UserID = U.ID)

while (select COUNT(*) from @Result)<10
begin
set @Random = (select floor(RAND() * @TotalRows+1))

insert into @Result
select T1.ID, T1.Title, T1.Level, T1.Point from
(select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T1
left outer join (select top (@Random) * From [User] U inner join [Item] R on R.UserID = U.ID) T2 on T2.ID = T1.ID
where T2.ID is null


end

select * from @Result

Here is how it works.

Select a random number.   For example 47. 
We want to select the 47th row of the table. 
Select the top 47 rows, call it T1. 
Join it to the top 46 rows called T2. 
The row where T2 is null is the 47th row. 
Insert that into a temporary table. 
Do it until there are 10 rows. 
Done.
Chanicho
  • 354
  • 1
  • 5