When I read the explanation in the accepted answer to this question, I cannot see how order by NEWID()
offers randomness for the whole table. Here is the query plan given in the accepted answer:
The query plan, which is also what's given to me by my local copy of SSMS, assigns a random guid value to every row, but this assignment would produce a result set that follows the order of rows that I'd get if do a SELECT * FROM whatever_table
.
Yes, sql does not guarantee any order for result sets unless order by
is used, but almost all implementations I've seen return a result set that strongly overlaps with the insertion order, i.e. the physical layout of data on the disk, for obvious IO efficiency reasons.
Since Sql Server uses a TOP N Sort
, it means it'll just randomise the order of the top N
rows of the result set that I'd get with SELECT * FROM whatever_table
. That's in a way a randomised ordering of more or less the same result set every time the query is run, not a randomisation of the rows from the whole table.
This answer to a different question however, filters out randomly generated values across the whole table's rows and randomises the selection.
Am I wrong?
Update:
Of course I'm wrong. My mistake lies in my incorrect assumption of how TOP N Sort
operates. I thought it takes first N
rows of a result set, then orders that subset. Instead, it finds the Max (presumably Min if order by newid() asc
is used) and stops finding the next value when N
values are found.
Thanks for the comments that allowed me to see my error in my thinking.