I want to randomly select 20 rows from a large table and use the following query that works fine:
SELECT id
FROM timeseriesentry
WHERE random() < 20*1.0/12940622
(12940622 is the number of rows in the table). I now want to retrieve the number of rows automatically and use
WITH tmp AS (SELECT COUNT(*) n FROM timeseriesentry)
SELECT id
FROM timeseriesentry, tmp
WHERE random() < 20*1.0/n
which yields zero rows even though n is correct.
What am I missing here?
Edit: id is not numerical which is why I can't create a random series to select from it. I need the proposed structure because my actual goal is
WITH npt AS (
SELECT type, COUNT(*) n
FROM timeseriesentry
GROUP BY type
)
SELECT v.id
FROM timeseriesentry v
JOIN npt ON npt.type= v.type
WHERE random() < 200*1.0/npt.n
which forces roughly the same amount of samples per type.