We are attempting to pull a semi-random row from Oracle. (We don't need perfectly random row that meets rigorous statistical scrutiny but we would like something that has a chance of getting any row in the table even though there may be some degree of skew.)
We are using this approach:
SELECT PERSON_ID FROM ENCOUNTER SAMPLE(0.0001) WHERE EXTRACT(YEAR FROM REG_DT_TM) = 2020 AND ROWNUM = 1
This approach appears to be giving us just one random result each time we run it.
However, according to answers to this question, this approach gives results from the beginning of the table far more commonly.
How commonly? If that statement is true then how much more commonly are values taken from the top of the table? Our typical table has tens of millions of rows (occasionally billions.) Is there a simple heuristic or a rough estimate to understand the skew in the distribution we can expect?
We are asking for skew because other methods aren't fast enough for our use case. We are avoiding using ORDER because the source tables can be so large (i.e. billions of rows) that the reporting server will run for hours or can time out before we get an answer. Thus, our constraint is we need to use approaches like SAMPLE that respond with little database overhead.