0

I use Postgresql 9.1. I have a table and I would like to get two things done in one query:

First: select one random sample row:

select * from table order by random() limit 1;

Second: count the number of rows the sample was selected from:

select count(1) from table;

Since a proper random() function has to know about all rows, I wonder if there is a way to get both results in one query.

I am guessing for something link:

select count(1), first.* from table order by random();

but i don't really know how to do it.

So I want a random row and the total number of rows.

Thanks for any ideas.

Steve

alfonx
  • 6,936
  • 2
  • 49
  • 58
  • So you want a random row and the total number of rows ? – Aziz Nov 17 '11 at 17:09
  • This question is very similar to http://stackoverflow.com/questions/5297396/quick-random-row-selection-in-postgres which has interesting answers... – alfonx Aug 15 '15 at 09:20

1 Answers1

2

One way to do this:

SELECT *, (SELECT count(*) FROM tbl) AS ct
FROM   tbl
ORDER  BY random()
LIMIT  1;

If the table is huge and performance is an issue, there are tricks to speed up the random selection. Having a serial column without gaps would help a lot with that ...

If that is for you, have a look at this related answer: Random Row in MySQL
Works in PostgreSQL, too. Only substitute random() for rand().

BTW, the random() function works absolutely independent of the number of rows.


Here is a variant that produces the same with only one sequential scan:

SELECT *, count(*) OVER () AS ct
FROM   ef.adr
ORDER  BY random()
LIMIT  1;

A quick test with a table of 5k rows was slower, though.
Another one with a table of 400k rows shows a similar result. slightly slower. Window functions can be performance hogs. And count(*) is known to be rather slow for large tables in PostgreSQL, because every tuple has to be visited.

Index-only scans ("covering indexes") has recently been implemented in the devel version of 9.2. So we are in for a speedup with the next version here. More about that in a related post on dba.stackexchange.com:

In the meantime, 230 ms for the 400k table is not too bad.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • UPDATE: On PG 9.4 this is really running faster (compared to PG 9.1 which I was using when you answered my question) – alfonx Aug 16 '15 at 08:06