0

I've read How do I speed up counting rows in a PostgreSQL table? and https://wiki.postgresql.org/wiki/Slow_Counting but I'm not getting any closer to a better result. The problem with most of the answers in the other question do not have any kind of filtering and are relying on table wide statistics.

I have a table of about 10 million rows, currently the data is fetched in pages (I know this pagination strategy is not ideal) and I have to display a total count back to the user (business requirement). The query is fast, typically < 200ms and looks like this:

explain analyze
SELECT  DISTINCT ON (ID) table.*
FROM "table"
WHERE "table"."deleted_at" IS NULL
GROUP BY "table"."id"
ORDER BY "table"."id" DESC
LIMIT 25 OFFSET 200;


    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=530.48..585.91 rows=25 width=252) (actual time=0.870..0.942 rows=25 loops=1)
   ->  Unique  (cost=87.00..19878232.36 rows=8964709 width=252) (actual time=0.328..0.899 rows=225 loops=1)
         ->  Group  (cost=87.00..15395877.86 rows=8964709 width=252) (actual time=0.327..0.747 rows=225 loops=1)
               Group Key: id
               ->  Index Scan Backward using table_pkey on table  (cost=87.00..10913523.36 rows=8964709 width=252) (actual time=0.324..0.535 rows=225 loops=1)
                     Filter: (deleted_at IS NULL)
                     Rows Removed by Filter: 397
 Planning time: 0.174 ms
 Execution time: 0.986 ms
(9 rows)

Time: 77.437 ms

The problem is when I try and display back the count, via:

explain analyze
SELECT COUNT(*) AS count_all, "table"."id" AS id
FROM "table"
WHERE "table"."deleted_at" IS NULL
GROUP BY "table"."id";
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=87.00..21194868.36 rows=10282202 width=4) (actual time=0.016..16984.904 rows=10343557 loops=1)
   Group Key: id
   ->  Index Scan using table_pkey on table  (cost=87.00..5771565.36 rows=10282202 width=4) (actual time=0.012..11435.350 rows=10343557 loops=1)
         Filter: (deleted_at IS NULL)
         Rows Removed by Filter: 2170
 Planning time: 0.098 ms
 Execution time: 18638.381 ms
(7 rows)

I can't use probabilistic counts right this moment but I also can't live with 10-50 seconds to return back a count. Is there any other way I can speed this up?

Anthony
  • 15,435
  • 4
  • 39
  • 69
  • Performance questions require proper information as outlined in the [tag info to \[postgresql-performance\]](https://stackoverflow.com/tags/postgresql-performance/info). – Erwin Brandstetter Jan 09 '19 at 04:43
  • What is your Postgres version? –  Jan 09 '19 at 06:55
  • Use @MarynaKrasnova's answer to the question you quote. – Laurenz Albe Jan 09 '19 at 07:16
  • @LaurenzAlbe - row triggers won't work, this query is dynamic by the user and the permutations would be in the hundreds. I also don't see how my question can be marked a duplicate when I specifically linked it and said it falls short but I will edit for completeness. – Anthony Jan 09 '19 at 14:11
  • @a_horse_with_no_name 9.5.14 – Anthony Jan 09 '19 at 14:11
  • With Postgres 11 parallel execution might help for that statement –  Jan 09 '19 at 15:42
  • @Anthony Well, you never said *why* the question didn't help you or that you were using dynamic SQL. In that case there is no way to make it really fast, sorry. You cannot know how many rows a query will return unless you execute it. – Laurenz Albe Jan 09 '19 at 16:52

0 Answers0