I am looking for possible ways of random sampling in PostgreSQL. I found a couple of methods to do that with different advantages and disadvantages. The naive way to do that is:
select * from Table_Name
order by random()
limit 10;
Another faster method is:
select * from Table_Name
WHERE random() <= 0.01
order by random()
limit 10;
(Although that 0.01 depends on the table size and the sample size; this is just an example.)
In both of these queries a random number is generated for each row and sorted based on those random generated numbers. Then in the sorted numbers the first 10 are selected as the final result, so I think these should be sampling without replacement.
Now what I want to do is to somehow turn this sampling methods into sampling with replacement. How is that possible? Or is there any other random sampling method with replacement in PostgreSQL?
I have to say that I do have an idea how this might be possible but I don't know how to implement it in postgresql, Here is my idea:
If instead of generating one random value we generate S random values where S is the sample size,then order all of the random generated values,it will be sampling with replacement.(I don't know if I am right)
At this point I don't mind about the performance of the query.