I'm a little compute-constrained because I don't want to kill the database or bandwidth while pulling for hours.
my current query is something like this:
SELECT TOP (100000000) mycolumns
FROM mydb
WHERE lower_control_limit> value1
AND upper_control_limit < value2 ;
I am not getting a decent random sample, but I am afraid of putting commands that have to traverse billions of rows before returning values. I need a uniform random sample over the domain (rows).
What is the "best practice" for pulling a few hundred million rows from a store that is many billions of rows in size?
I've seen the random, but I'm just worried about the size of the query vs. limited computing and bandwidth resources.