1

Is there a way in PostgreSQL to abort execution of COUNT(*) statement and return its current result?

I would like to run:

SELECT COUNT(*) FROM table WHERE something=x;

Some queries are completed in almost no time, but some take quite a lot of time. I would like to have:

  • if statement is completed in within time limit then it returns final result,
  • else it aborts execution but returns current result.

It would be nice to get an exit status as well (whether it finished execution or was aborted).

I found statement_timeout setting, but it doesn't return any result, just aborts.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Dmitry
  • 135
  • 1
  • 7

2 Answers2

1

You can easily instruct Postgres to count up to a given LIMIT - a maximum number of rows, not an elapsed time:

SELECT count(*)
FROM  (
   SELECT 1 FROM tbl
   WHERE  something = 'x'
   LIMIT  100000  -- stop counting at 100k
   ) sub;

If count() takes a very long time, you either have huge tables or some other problems with your setup. Either way, an estimated count be good enough for your purpose:

It is not possible per se to stop counting after a maximum elapsed time. You could partition the count with the above technique and check the elapsed time after every step. But this adds a lot of overhead. Skipping rows with OFFSET is not that much cheaper than counting them. I don't think I would use it. Just as proof of concept:

DO
$do$
DECLARE
   _partition bigint := 100000;  -- size of count partition
   _timeout   timestamptz := clock_timestamp() + interval '1s';  -- max time allowed
   _round     int := 0;
   _round_ct  bigint;
BEGIN

LOOP
   SELECT count(*)
   FROM (
      SELECT 1 FROM tbl
      WHERE  something = 'x'
      LIMIT  _partition
      OFFSET _partition * _round
      ) sub
   INTO   _round_ct;

   IF _round_ct < _partition THEN
      RAISE NOTICE 'count: %; status: complete', _partition * _round + _round_ct;
      RETURN;
   ELSIF clock_timestamp() > _timeout THEN
      RAISE NOTICE 'count: %; status: timeout', _partition * _round + _round_ct;
      RETURN;
   END IF;

   _round := _round + 1;
END LOOP;

END
$do$;

You could wrap this in a plpgsql function and pass parameters. Even make it work for any given table / column with EXECUTE ...

If you have an ID column with few gaps, the technique would make a lot more sense. You could partition by ID with a lot less overhead ...

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for answering and editing the question. Sorry for my english :) I guess, count took a while because it had to join all other tables. I gave up that idea and traded normalization for performance. – Dmitry Jun 19 '16 at 00:47
  • @Dmitry: Typically, you can have both: Normalization *and* performance. A [materialized view](https://www.postgresql.org/docs/current/static/sql-creatematerializedview.html) might help. – Erwin Brandstetter Jun 19 '16 at 00:49
0

I don't believe you will ever get a result set with a count until a query completes and makes it visible to the end user, aka you. Such is the way of fundamental rules of an ACID database. From initiating a SELECT command you're asking for a snapshot of the number of rows at that moment in time.

You would probably be better off looking at the issue from another angle and look into why some queries take a long time by performing an EXPLAIN on the query and then investigate the results.

d1ll1nger
  • 1,571
  • 12
  • 16