If PostgreSQL count(*) is always slow how to paginate complex queries?

Question

If PostgreSQL's count(*) is always slow how to paginate complex queries?

Making triggers doesn't seem to be a good solution as long as in this case we have a lot of pages (for example different categories, filters, etc).

What to do if VACUUM/VACUUM ANALYZE/ANALYZE/VACUUM FULL doesn't help? What are the best practices to use count(*) with postgresql?

Count, (and any other aggregate function) on any criteria is wholly dependent on useful indexes, no matter what DBMS. — Tony Hopkinson, Jun 04 '13 at 11:34
How does a `count(*)` relate to paging? Can you show us the statement you are using? Do you run an extra `count(*)` before running the actual query? Did you try to do the `count(*)` together with the main query? — , Jun 04 '13 at 11:53
@a_horse_with_no_name `count(*)` is used to count amount of pages. My statement is simple `SELECT count(*) from my_table` (which runs very slow) and `select count(pub_date) from catalog_ad where "catalog_ad"."pub_date" >= '2013-04-20 00:00:00';` (which is faster, but still slow). I run `count` separately of main query. How do I run them together? — Daniil Ryzhkov, Jun 04 '13 at 12:00
See here: http://sqlfiddle.com/#!12/92045/1 You also might want to read this: http://use-the-index-luke.com/sql/partial-results for some deeper information on how to speed up paging. — , Jun 04 '13 at 12:08
Check [my recent answer on how to do pagination with conditions and a single count](http://stackoverflow.com/a/16842749/131874). — Clodoaldo Neto, Jun 04 '13 at 12:09

Craig Ringer · Accepted Answer · 2014-01-14T00:26:45.877

19

Did you read the heading on that article?

Note that the following article only applies to versions of PostgreSQL prior to 9.2. Index-only scans are now implemented.

Use 9.2 and you'll generally find you get much better results. Read the index-only scans wiki page for details.

That said, on older versions using LIMIT and OFFSET generally works fine. You can estimate rowcounts (and therefore pagecounts) using the table statistics if you don't mind a bit of variation. See "Estimating row count" in the article you already linked to.

Paginating using LIMIT and OFFSET is, IMO, an anti-pattern anyway. A lot of the time you can rephrase your pagination code so it uses sort_column > 'last_seen_value' LIMIT 100, i.e. it avoids the offset. This can sometimes result in very large performance gains.

edited Jan 14 '14 at 00:26

answered Jun 04 '13 at 11:41

Craig Ringer

307,061
76
688
778

2

@Daniil You already linked to the answer; just read http://wiki.postgresql.org/wiki/Slow_Counting, "Estimating row count" – Craig Ringer Jun 04 '13 at 11:43
that's pretty good solution when I want to count rows in the whole table, but what if I want to filter table? – Daniil Ryzhkov Jun 04 '13 at 11:46
@Daniil Count will generally be able to use an index for a query with a filter. It still has to do heap lookups until 9.2, but unless you're doing a filter that matches most of the table should perform OK. Otherwise: Upgrade. – Craig Ringer Jun 04 '13 at 12:02
@Daniil Remember to read the manual on upgrading; remember it isn't an in-place upgrade, you need to use `pg_upgrade` or dump and reload. – Craig Ringer Jun 04 '13 at 12:10
8

just upgraded to 9.2. Nothing changed – Daniil Ryzhkov Jun 04 '13 at 15:17
@Daniil `explain (buffers, analyze)` then, please; let's see what it's doing. – Craig Ringer Jun 04 '13 at 23:10
2

I have the same problem. I.e. I need to paginate over filtered results. Filter is variety of user-defined settings, i.e. it has "over 9000" variations. So caching the count is not a (viable) option. I've put the queries and their explains here: https://gist.github.com/NoICE/59a915e4b2b522e574e8 Any hint/help would be much appreciated! :-) P.S. I don't know why adding the joins (as seen in the gist) produces such terrible performance dropdown (as it is indexed, it should NOT imo...). – Dalibor Filus Nov 02 '14 at 10:45
While Query PLAN estimate works, I could run that under JPA. JPA do not allow me to get the result from trigger function. – Maxi Wu May 10 '18 at 10:22
According to [_Is "count(*)" much faster now?_](https://wiki.postgresql.org/wiki/Index-only_scans#Is_.22count.28.2A.29.22_much_faster_now.3F) this might be the reason: _"Index-only scans *can* be used to satisfy these queries [...]. However, in practice that isn't particularly likely. [...] "count(*) without any predicate" queries will only use an index-only scan if the index is significantly smaller than its table. This typically only happens when the table's row width is much wider than some indexes'. "_ – nh2 Jul 11 '18 at 01:52
1

@CraigRinger Here is the `explain (buffers, analyze)` you requested for my example data: https://gist.github.com/nh2/bbbff93be04ea6c5181ec948981081b1 -- Do you know why the equivalent of `Index Only Scan using categories_pkey on categories` from the [index-only scans wiki page](https://wiki.postgresql.org/wiki/Index-only_scans) you linked isn't kicking in here? Can we force it to kick in? – nh2 Jul 11 '18 at 02:11
1

In my case, `VACUUM ANALYZE mytable;` helped: After it I see `Index Only Scan` in the `explain` (see my [gist](https://gist.github.com/nh2/bbbff93be04ea6c5181ec948981081b1#edit-vacuum-analyze-seems-to-help)) and I got a 2x speedup. – nh2 Jul 11 '18 at 02:34
1

... and a 20x speedup with [cold buffer cache](https://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system). Also want to leave here that just `ANALYZE` didn't do the job, `VACUUM ANALYZE` did. – nh2 Jul 11 '18 at 02:53

score 16 · Answer 2 · edited May 01 '14 at 13:52

16

If you're doing SELECT count(*) FROM table and have pg stats enabled you can use the lower example, which in this case goes from 13ms down to 0.05ms.

SELECT count(*) FROM news;

26171

EXPLAIN ANALYZE SELECT count(*) FROM news;

Total runtime: 13.057 ms

SELECT reltuples::bigint AS count FROM pg_class WHERE oid = 'public.news'::regclass;

26171

EXPLAIN ANALYZE SELECT reltuples::bigint AS count FROM pg_class WHERE oid = 'public.news'::regclass;

Total runtime: 0.053 ms

edited May 01 '14 at 13:52

cimmanon

67,211
17
165
171

answered Dec 13 '13 at 19:04

Mark Selby

597
5
7

To confirm the performance difference: `(0.4ms) SELECT reltuples::bigint AS count FROM pg_class WHERE oid = 'public.aggregate_data_points'::regclass` vs. `(398.9ms) SELECT COUNT(*) FROM "aggregate_data_points"` – Jeremy Lewis Jan 15 '16 at 09:43
10

These two queries give different results. – Christopher Reid Feb 04 '16 at 03:08
1

Take into account the second query (with reltuples) is an estimate only, it will not always yield the exact value. – Jhuliano Moreno May 16 '18 at 09:43
@JhulianoMorenoHow close is the estimate? And what effects this estimate – Zia Ul Rehman Mughal Nov 02 '18 at 08:50
2

How can you add filters like where clauses to this? – Anthony Jan 09 '19 at 15:29

If PostgreSQL count(*) is always slow how to paginate complex queries?

2 Answers2

Linked